W. D. Gray, S. Banerjee / Topics in Cognitive Science 13 (2021)629
to identify factors that might explain differences between expertise groups. (As discussed in
Section 3.5.1, above, the expertise level for each player was calculated by taking the mean of the final level of gameplay for their top-four games.)
The best linear models were derived using a bidirectional step-wise model selector based on the Akaike information criterion (AIC). AIC entails an iterative process of input variable selection based on the significance of the information each variable contributes to model fit. Finally, further analyses were performed to determine the influence of random seeds on gameplay (these analyses will be discussed in Section 7).
4.1. Exploratory factor analysisFig. 10 shows the correlation matrix constructed from the level-averaged feature values in the data. The heat-map for the correlation matrix is shown on the left side of the figure with the numbers for each entry shown on the right side. These values provide the input to our EFA. Appendix A lists all 35 level-averaged features
along with their descriptions, and information about how they were calculated.
Factor analysis finds sets of correlated features and uses these sets to form individual factors. The method used here for identifying latent factors (Costello and Osborne, 2005) is
principal component analysis (PCA). PCA finds linear combinations of
features in the original data, called components. The weight/contribution of each feature fora component is its loading value (see Figure The first component captures the highest amount of variance in the distribution of the data,
the second component captures the second highest variance in the data, and soon (Wold,
Esbensen, & Geladi, 1987).
By default, these components are orthogonal to each other, which means that there is no collinearity present among the components.
In general, it can be difficult to clearly determine the type of information each component carries. However, rotation of components solves this problem, as the components become factors that represent linear combinations of subsets of the original features. The loadings of other less important features
are awarded near-zero values, which can then be ignored. By examining the features that constitute each rotated factor, it is possible to specify the kind of information the factor carries. For our rotations,
we used varimax-rotation, which is one of the most commonly used forms of orthogonal-rotation (Jackson, Our PCA used level-averaged features (explained in Section 3.5.3). One of the commonly recommended methods for selecting the number of to-be-retained factors is the Kaiser rule, that is, select all factors whose eigenvalue is greater than 1 (Kaiser, 1960). However,
Costello and Osborne (2005) warn that this method often leads to suboptimal results (because analysts end up retaining too many factors) and suggest other methods for the selection process. Interestingly, the human eye is generally considered at least as accurate as an algorithm for this process so that the most common method entails plotting the data to then look for the inflection point (as per the Fig. 12 plot for our current data set.
In such plots, the horizontal line represents an eigenvalue of 1 (serves as a reference line, factors below that should not be selected for analysis) and the vertical line is the point at which the slope