B.Results
First the RNN was used to classify each individual instance (10-frame window). The results of the RNN were tested against the overall label associated with the game-point window that each 10-frame window belongs to. The reason to perform this first testing is that we see the overall label attached to a game-point window as a predominant state over the window as assigned by the observers.
The results for this testing are shown in Table VI. Overall, 13244 samples corresponding to 58.4% of the testing set were associated with the correct overall emotion label. In particular, 64% of the ‘high intensity negative emotion’ samples were correctly classified, 58% accuracy was obtained for ‘happiness’ and 67% for low intensity negative emotion. However, only 36% accuracy was obtained for ‘concentration’. The column named as ‘Undecided’ in Table VI contains the number of test samples that the RNN was not able to categorize into one of the four classes.
Finally, a majority vote was applied to the results of the RNN over the set of the 10-frame windows composing a game-point window. The result was again matched against the overall label attached to the game-point window. The recognition performances are shown in Table VII. The testing set contained 54 out of the 161 data which correspond to the 22680 samples we mentioned above. The system was able to categorize correctly 33 out of the 54 data instances, giving an overall 61.1% success rate. It achieved very good performance for the high intensity negative emotions and low intensity negative emotions (67% and 70% respectively). For concentration, the system was able to categorize correctly 50% of the test data set correctly. The results for happiness were the lowest ones. The high intensity negative category and the low intensity negative category were quite different from each other and we can see that none of the test samples was misclassified in each other's category. The “Undecided” category corresponds here to the instances for which there was not a clear winner according to the majority rule.
TABLE VI
Confusion matrix for 10-frame window animations. The predicted label is associated with a 10-frame window animation, whereas the actual label is the overall label associated with the game-point window the animation belongs to by the human observers.
|
Predicted
|
|
|
High Negat.
|
Happy
|
Conc.
|
Low Negat.
|
Undecided
|
Actual
|
High Negat.
|
4575
(64%)
|
862
|
944
|
123
|
636
|
Happy
|
203
|
1938
(58%)
|
375
|
172
|
672
|
Conc.
|
830
|
602
|
1659
(36%)
|
520
|
1009
|
Low Negat.
|
174
|
636
|
945
|
5072
(67%)
|
733
|
TABLE VII
Confusion Matrix For Game-Point windows
|
Predicted
|
|
|
High Negat.
|
Happy
|
Conc.
|
Low Negat.
|
Undecided
|
Actual
|
High Negat.
|
12 (67%)
|
2
|
1
|
0
|
3
|
Happy
|
2
|
3
(43%)
|
1
|
0
|
1
|
Conc.
|
1
|
2
|
6
(50%)
|
2
|
1
|
Low Negat.
|
0
|
1
|
2
|
12
(70%)
|
2
|
The results appear to be quite interesting and clearly above chance level and show the relationship between amount of movement and affect [52, 65]. Nevertheless, some improvement could be obtained with the valence dimension given the same level of arousal. A number of misclassifications occurred between ‘Low intensity, negative emotion’ and ‘Concentration’ classes and between ‘High intensity, negative emotion’ and ‘Happiness’ classes. These were also the typical disagreement between human observers during the survey. It is, indeed, possible that such expressions may have a mixed affective meaning. For example, a person who wins might do a sudden movement to express a mixture of happiness and anger (revenge). However, some of these misclassifications could possibly be solved by adding postural configurational features to the dynamic ones. In fact, configurational features have been shown to provide valence information in static [61] and acted context [58]. In [74], [60] the authors discuss the different contributions made by dynamic and configurational features to the recognition of emotions from body expressions.
Fig. 5. Real-time detection of player’s affective space. The x-axis represents the 10-frame windows, the y-axis the predicted emotional state for each 10-frame window. A game-point window is represented by a set of continuous overlapping 10-frame windows represented by the same symbol and colour. 4 game-points are illustrated here. For visualization reason, only a subset of the 10-frame windows per game-point is here represented, i.e., 1 every 10. The continuous line represents the overall predicted affective state (based on majority voting) for each game-point.
A more refined labelling of the data could also provide further improvement to the recognition system. Figure 5 shows the continuous detection of the emotional states of a player for 4 game-point windows. The x-axis represents the 10-frame windows. For visualization reason, only a subset of the 10-frame windows per game-point is here represented (i.e., (1 every 10). The y-axis represents the predicted emotional state for each 10-frame window. A symbol in the graph represents the emotional state predicted by the RNN for the corresponding 10-frame window. A game-point window is represented by a set of continuous 10-frame windows represented by the same symbol and colour.
Fig. 6. The differences in angular frequency between the participants that portray anger. Participant 9 is a female.
Whilst within each game-point window the RNN detects a variety of emotional states, for each game-point window the system predicts, by a majority rule, an underlying overall emotional state represented by the blue line. The majority rule was proposed here to reflect the fact that an overall affective label assigned by the observers represented the main overall valence and level of arousal observed in an animation. Since each game-point window lasted between 2 and 33 seconds, it may have included movements expressing different valence and intensity levels of expressions. Hence, higher performance could be obtained by asking the observers to label continuously each game-point window as proposed in [15] rather than assigning to it an overall label. However, this continuous labelling is very time-consuming and tiring for the observers and therefore has its own limitations. Whilst we consider it important to test this latter approach, we leave this for a future study and refinement of our approach.
TABLE VIII
Confusion matrix for the female participant for 10-frame window animations. The predicted label is associated with a 10-frame window animation, whereas the actual label is the overall label associated with the game-point window the animation belongs to by the human observers.
|
Predicted
|
|
|
High Negat.
|
Happy
|
Conc.
|
Low Negat.
|
Undecided
|
Actual
|
High Negat.
|
407
(36%)
|
204
|
306
|
68
|
153
|
Happy
|
119
|
696
(44%)
|
408
|
102
|
255
|
Conc.
|
51
|
238
|
1818
(51%)
|
798
|
697
|
Low Negat.
|
34
|
119
|
391
|
1546
(56%)
|
662
|
To further test the recognition system, we measured its ability to generalize to other players. This was decided on the basis of the large diversity of playing styles observed in our data set and highlighted in previous studies [37], [38]. From Fig. 6, we can notice that the players could be categorized into two groups: the ones that do not move their arm a lot during the game (P2, P6, P7 and P9-female) and the ones that move their arm a lot (P1, P3, P4, P5 and P8). This difference is mainly due to the fact that some players tend to play the game using only their hand/wrist instead of using proper tennis arm movements.
A cross-validation approach was here used. For each fold of the cross-validation, all data of one player were left out of the training set and used for testing only. On average, the algorithm reached 54.8% correct recognition rate. Whereas this result is lower than the 61% of correct recognition obtained by the person-dependent model, it is still above chance level. The results also showed that the test for one of the participants was significantly lower (49.23 %). The other 8 participants had similar results varying between 53% and 59%. One possibility is that this participant was the only female in our data set. Table VIII shows the confusion matrix for the female participant; we can see that our system was unable to recognize most of the high-intensity negative emotions and happiness. It is possible that this participant had a very different playing style (body dynamics) from that of the male participants but also presented a different emotional pattern (sequence of 10-frame windows) underlying a same overall emotional state (a game-point window). Including other female participants in the training set may solve this problem.
Share with your friends: |