Continuous recognition of player’s affective body expression as dynamic quality of aesthetic experience



Download 192.99 Kb.
Page4/8
Date18.10.2016
Size192.99 Kb.
#2581
1   2   3   4   5   6   7   8

III.Data Collection


The Wii Nintendo Grand Slam Tennis game was used for this experiment. A motion-capture system, Animazoo IGS-190, was used to record the movements of participants during full-body game play. The motion-capture system has 17 sensors placed on the head, neck, spine, shoulders, arms, forearms, wrists, upper legs, knees and feet. The acquisition rate was set at 60 frames/sec.

Nine players (8 males and 1 female), ranging in age from 20 to 30 years old, were recruited for the experiments. They were all university students. All players were familiar with the Wii Tennis game. Five players played it at least once a month, three played it once a week and one player played it more than once a week. This last player was the only one to play the Wii Grand Slam Tennis game used in our experiment on a regular basis. Four of them considered themselves to be very skilled at playing the Wii tennis game and five of them considered themselves to be moderately skilled. Only one of them considered himself not very good at it. Three of them rated themselves not competitive, whereas the others rated themselves moderately competitive.

To make the players feel more comfortable and reduce the feeling of lab conditions, we asked the participants to bring a friend to compete with. The participants were asked to play the Wii Grand Slam Tennis game for 15 minutes while being recorded with the motion-capture system and by a video camera.
avatar

Fig. 1. Samples of frames of one of the avatar animations



gt-frequency_9.jpg

Fig. 2. Top: Boxplots representing the frequency of use of each label when selected as ground truth for the animations. The Frequency values are all well above chance level. Bottom: Distribution of the 175 avatar animations according to the 8 emotion labels.


After collecting the motion-captured data, we segmented them into ‘playing’ and ‘non-playing’ frame windows. We were able to collect 423 significant playing windows containing either winning or losing points. Each window time length varied between 2 and 33 seconds (i.e., between 120 and 1980 frames per window, mean=545.78). By examining the motion-captured data, it was found that 248 out of 423 windows were very noisy (due to the gimbal lock problem1 [63]) and we decided to exclude them as sufficient data would be available. As a result, our final dataset consisted of 175 windows (an average of 19.4 game-point windows ( = 1.4) per participant.

In order to identify the set of affective states to focus on, we first asked the participants to freely list the emotions they had felt during the game. Furthermore, we observed the set of collected videos. At the end, eight emotion labels were selected: Frustration, Anger, Happiness, Concentration, Surprise, Sadness, Boredom and Relief.

In order to assign to each animation its affective ground truth (i.e., the affective state it conveys to an average observer [61]), an online evaluation survey was conducted using computer-animated avatars (See Fig. 1). These animations were built using the motion-captured data corresponding to the selected 175 windows. Clips of computer-animated avatars were used, instead of the videos of the actual human participants, to create faceless, non-gender, non-culturally specific ‘humanoids’ in an attempt to eliminate bias in the evaluation of the body expressions. The reason for using external observers, rather than the players, to build the ground truth is due to the unreliability of post-task reported feelings.

The ground truth to be assigned to each animation was obtained through two steps. First, a forced choice survey was created and nine observers were recruited for the labelling task. The survey required the observers to assign one of the eight labels to each animated avatar according to the affective expression the avatar’s body movement conveyed. Then, the most frequent label assigned by the observers to an animation was selected as ground truth for that animation. The boxplots in Fig. 2 (top) represent the frequency of use of each label as ground truth. We can see that, for all the selected ground truths (x-axis), their frequency of use for each animation (y-axis) is well above chance level (11%) showing high agreement between the nine observers.

Fig. 3. Distribution of the 161 avatar animations according to the 4 labels
We then analysed the distribution of labelled animations according to their assigned ground truth. Fig. 2 (bottom) shows the distribution of the 175 animated avatars grouped according to the most frequent label associated with them. We can see that ‘relief’ and ‘surprise’ are not well represented in the data set. These two groups of animations were hence discarded as it would have not been feasible to build a recognition model with such a small number of instances. In order to obtain a larger number of instances for each emotion class, it was also decided to merge instances that were labelled with similar emotions. We deem it acceptable, as the agreement between observers proved to be well above chance level (Fig. 2 (top)). A similar approach was used in [61]. According to Storm et al. [64], frustration and anger are both negative and high intensity emotions, with anger generally more intense than frustration. Hence, frustration and anger were combined into a category called ‘high intensity negative emotion’. Sadness and boredom are negative emotions characterized by low energy/intensity, so they were grouped into one category called ‘low intensity negative emotion’. The distribution of the remaining 161 animated avatars (µ=17.9 game-point windows/participant,  = 1.3), according to the 4 affective states, is illustrated in Fig. 3. These 4 classes of affective states cover four quadrants of the valence-arousal space generally used to describe emotional states, with ‘concentrated’ being a neutral-positive low arousal state and ‘happiness’, in this case, representing the high intensity positive emotions.
TABLE I

Movement Feature computation for a n-frame window



Feature

Formula

Body Segment Rotation

= rotational angle of the ith sensor at frame t.

Angular Velocity



Angular Frequency



Orientation



Angular Acceleration



Body Directionality

only for



Amount of Movement



Where:

n = size of the n-frame window for which the feature is computed (in

our case n = 10)

m = length of the game window the n-frame window belong to.

i = sensor, for i=1…51, i.e. 17 joints x 3 rotational axes.

t = frame number, t = 1 … m-n+1 (the feature computation stops at the last 10-frame window).


In order to set a baseline target for the performance of the recognition system, a second group of 7 observers was recruited and asked to label the same 161 animations. Following the same process described above, a new ground truth was computed for each animation. The agreement between the first group and the second group of observers was finally computed on the basis of the ground truth assigned by each group to each animation. 61.49% of the animations obtained the same ground truth from the two groups. Hereafter, we refer to this value as baseline.





Download 192.99 Kb.

Share with your friends:
1   2   3   4   5   6   7   8




The database is protected by copyright ©ininet.org 2024
send message

    Main page