IV.Automatic Recognition of Player’s affective states A.Features extraction and Modelling
In order to build our recognition system, the following dynamic features were selected: Body Segment Rotation, Angular Velocity, Angular Frequency, Orientation, Angular Acceleration, Body Directionality and Amount of Movement. This first selection was based on previous studies modelling affective body movements (e.g., [53], [54], [58], [65]). Table I provides the formulas used to compute each feature. As the motion capture provides the Euler angles for the three rotational axes Z, X, Y, each feature (except for the Amount of Movement) is computed separately for each rotational axis. The Body Segment Rotations are the Euler angles for each joint and for each axis provided directly by the motion capture system. The difference between Angular Velocity and Angular frequency is the fact that angular velocity is a vector quantity which specifies the angular frequency of an object along the axis around which the object is rotating. The Orientation indicates the direction of the Angular Velocity for each individual joint and axis. The Directionality is computed only for the Head and for the Spine. It is used to capture the overall orientation of the body with respect to the game. The Amount of Movement gives an overall rotational perspective of how much the player has moved between consequent captured frames.
TABLE II
Body Movement Features. Each movement feature is individually computed for each rotation axis and for each body part listed here
Feature Type
|
Features
|
Body Parts
|
Motion Features
|
Body Segment Rotation
(X,Y,Z rotations)
|
Right Forearm, Arm, Hand
|
Angular Velocity
(X,Y,Z rotations)
|
Right Forearm, Arm, Hand
|
Angular Acceleration
(X,Y,Z rotations)
|
Right Forearm, Arm, Hand
|
Angular Frequency (X,Y,Z rotations)
|
Right Forearm, Arm, Hand
|
Orientation
(X,Y, Z rotations)
|
Right Forearm, Arm, Hand
|
Body Directionality
(X, Y, Z rotation)
|
Spine, Head
|
Frame Interval Features
|
Amount of Movement
|
With respect to each sensor and rotation axis
|
Since we are dealing with time-related features, a dynamic learning algorithm was better suited for building our system. A Recurrent Neural Network algorithm (RNN) [67]-[69] was therefore selected. The inputs to the network are the features listed in Table II. The number of output nodes corresponds to the number of selected affective labels. The RNN classifies any new incoming input by taking into consideration a number r of previously-classified instances. This parameter is called the depth of the RNN.
Each of the 161 game-point windows was split into a set of consecutive overlapping intervals of n frames each, where n was experimentally decided. We call these intervals n-frame windows. Each n-frame window overlaps with the subsequent n-frame window by n-1 frames. The features listed in Table II were computed for each n-frame window, rather than for each game-point window, providing a better measure of the dynamic of the movement. Each feature is computed for each body part and each rotational axis indicated in Table II. To take into account individual differences, the feature values of each participant were normalized according to the maximum and minimum values observed in each body part over the whole data set for that participant.
TABLE III
Initial Network Parameters
Parameter
|
Final Value
|
Tested range
|
Input nodes
|
19
|
All features listed in Table I
|
Hidden layer nodes
|
90
|
10-150
|
Output nodes
|
4
|
4, 8
|
Learning rate
|
0.7
|
0-1
|
Momentum
|
0.3
|
0-1
|
Recurrency parameter
|
0.5
|
0-1
|
Size of n-frame window
|
10
|
5-200
|
Depth of the RNN
|
10
|
5-100
|
TABLE IV
a sample of trial results to identify the best set of features. Each row indicates the type of features used in that trial. The last column indicates the % recognition performances. For each type of feature, all rotational axes listed in table II were included.
-
ID
|
Angular
Velocity
|
Angular frequency
|
Angular Acceleration
|
Orientation
|
Body Direction
|
Amount
of movement
|
Body
Rotation
|
Accuracy (%)
|
1
|
×
|
×
|
×
|
×
|
×
|
×
|
×
|
37
|
2
|
×
|
|
|
|
|
|
|
40
|
3
|
|
×
|
|
|
|
|
|
41
|
4
|
|
|
×
|
|
|
|
|
36
|
5
|
|
|
|
×
|
|
|
|
13
|
6
|
|
|
|
|
×
|
|
|
21
|
7
|
|
|
|
|
|
×
|
|
42
|
8
|
|
|
|
|
|
|
×
|
32
|
9
|
×
|
×
|
|
|
|
|
|
47
|
10
|
|
×
|
×
|
|
|
|
|
44
|
11
|
|
|
×
|
×
|
|
|
|
29
|
12
|
|
|
|
×
|
×
|
|
|
17
|
13
|
|
|
|
|
×
|
×
|
|
40
|
14
|
|
|
|
|
|
×
|
×
|
45
|
15
|
|
×
|
|
|
|
×
|
|
53
|
16
|
|
×
|
|
|
|
|
×
|
38
|
17
|
|
×
|
|
|
×
|
|
|
38
|
18
|
×
|
|
|
|
|
×
|
|
55
|
19
|
|
|
×
|
|
|
×
|
|
49
|
20
|
×
|
|
×
|
|
|
|
|
47
|
21
|
×
|
×
|
×
|
|
|
|
|
45
|
22
|
×
|
×
|
|
|
|
×
|
|
57
|
23
|
×
|
|
×
|
|
|
×
|
|
52
|
24
|
|
×
|
×
|
|
|
×
|
|
55
|
25
|
|
×
|
|
|
|
×
|
×
|
51
|
The classification of each n-frame window into one of the 4 emotion classes takes into account the classification of the previous r n-frame windows allowing the network to exhibit dynamic temporal behaviour. Then, the results obtained for the consecutive n-frame windows forming a game-point window are pooled together and using a majority rule an overall affective state is assigned to the game-point window. The reason to use a majority rule comes from the fact that the labelling of the data was performed at game-point window level. As such, we aim to test the system both at n-frame level and game-point level.
To build and test the system, the data set (i.e., 161 game-point windows) was split into training set (2/3rds) and testing set (1/3rd). The training set was used to tune the RNN parameters by applying a 5-fold cross validation. The remaining 1/3rd was used to test the overall performance of the RNN (see Results section) after the tuning of all the parameters. Each parameter (Table III) was tuned individually by keeping all the other parameters constant. The range of values tested for each parameter is indicated in the table. Then, the identified values were tested again using all the parameters and slightly changing them for fine tuning. This last modification of the parameters was repeated until the system’s performance stabilized, meaning that the changes on the overall performance were less than 0.1% between consecutive runs. The optimal value for n-frame window size was 10. This means that the 161 game-point windows produced 65030 10-frame windows to be used as the training set (107 game-point windows) and 22680 10-frame windows to be used as the testing set (54 game-point windows).
TABLE V
Highest correlation values between pairs of features from Table II. Vel= Angular Velocity, Freq= Frequency. R == Right. Z = Z-rotation axis, X= X-rotation axis, Y=Y-rotation axis.
Feature 1
|
Feature 2
|
Pearson correlation
|
p-value
|
Vel. R. Arm (Z)
|
Vel. R.Arm (Y)
|
.487
|
.000
|
Freq. R. Forearm (Z)
|
Vel. R. ForeArm (Y)
|
.441
|
.000
|
Vel. R. Forearm (Z)
|
Vel. R. Forearm (Y)
|
.425
|
.000
|
Freq. R. Arm (Z)
|
Freq. R. Arm (Y)
|
.363
|
.000
|
Freq. R. Arm (X)
|
Freq. R. Arm (Y)
|
.349
|
.000
|
Freq. R. Hand(Z)
|
Freq. R. Hand(Y)
|
.328
|
.000
|
Freq. R. Hand(X)
|
Freq. R. Hand(Y)
|
.298
|
.000
|
Vel. R. Arm(X)
|
Vel. R. Arm(Y)
|
.29
|
.000
|
Freq. R. Hand(Z)
|
Freq. R. Hand(X)
|
.262
|
.000
|
Vel. R. Forearm(Z)
|
Vel. R. Forearm(X)
|
.26
|
.000
|
Freq. R. Arm (Z)
|
Freq. R. Arm (X)
|
.246
|
.000
|
The tuning of the network also involved the identification of the best discriminative features. As the motion-captured data provided the 3D rotational information for each segment of the body, a testing of the RNN, using different combinations of these features and of each joint, was performed separately. Table IV shows performance results for some of the feature combinations. As shown in Table IV, the best RNN performance results were obtained using the features: the overall Amount of Movement, and the Angular Velocity and Angular Frequency of the right arm, forearm and hand along the three rotational axes (i.e., a total of 19 features). The Person correlation was computed for each pair of these 19 features (171 pairs) to identify possible redundancies. 93.5% of the pairs showed very low correlation (< .1). The remaining set of pairs showed higher but still quite low correlation values (<.5) as shown in Table V. The correlation between the feature Amount of Movement and any of the other 18 features was lower than 0.1.
Fig. 4. Angular velocity for the X-rotations of the right forearm during one window period.
Fig. 4 shows examples of the Angular Velocity feature with respect to their associated affective label. As we can see from the figure, the angular velocity profile of the right forearm of animations representing the eight emotions show periods of similar patterns but also quite characteristic sections. The RNN classifies a 10-frame window at a time by taking into account the result of the previous 10 10-frame windows. Therefore, even if ‘surprise’ and ‘happy’ animations may share short patterns, the classification of each short pattern should be affected also by the previous patterns facilitating the discrimination.
Share with your friends: |