5. Two at a time Using logistic regression to compare player groups Logistic regression models (also called generalized linear models or logit models) are often used for classifying categorical binary data. As we are trying to compare two groups of players with each model, logistic regression is perfect for our use-case. These models first perform a linear combination of input variables and then feed the output into an activation function, which converts the values into binary outputs. The models are first trained to fit the data, and then the goodness-of-fit is determined by calculating the mean squared error. 5.1. Overview of analyses We fed our regression models the six factor values (defined in Section 4) as input and trained each model to classify players as belonging to one of two classes. (For example, for model 1, factor 1, the best fit for planning-efficiency was. Also, note that the sign, positive or negative, is not important the best model is the one with the greatest absolute size) The best version of each model was determined through bidirectional, step-wise model selection based on AIC. Four logistic regression models were trained Model 1: Trained to distinguish between beginner and intermediate players, from factor values corresponding to gameplay at level 0. • Model 2: Trained to distinguish between beginner and intermediate players, from factor values corresponding to gameplay at level 2 (last level of stable gameplay for beginner players, on average Model 3: Trained to distinguish between intermediate and expert players, from factor values corresponding to gameplay at level 0. • Model 4: Trained to distinguish between intermediate and expert players, from factor values corresponding to gameplay at level 5 (last level of stable gameplay for intermediate players, on average). It should be noted that as we move from model 1 to 2 and model 3 to 4 (models that are fit to the same population but at different levels of gameplay), we still retain the same sample of players but the number of games per player reduces. This is because beginners do not survive level 2 for all of their games and intermediate players do not always make it beyond level. In general, only a subset of the games considered by models 1 and 3 (at level 0) is also considered for models 2 and 4 (levels 2 and 5). 5.2. Results The model fits were evaluated using fold cross-validation for predictive performance (mean squared error. Table 4 lists the results for the model fits. Model 1 uses data collected from Beginner and Intermediate players at level 0. Even at level 0, the pattern of performance of these two levels of players differ on planning efficiency (Factor 1), zoid control (Factor 3), pile uniformity (Factor 4), and minimum lines cleared (Factor 5).
636 W. D. Gray, S. Banerjee / Topics in Cognitive Science 13 (2021) Ta b le 4 Be g v ersus intermediate: At Game Le v el 0 and, ag ain, at Game L ev el 2. Intermediate versus expert: A t G ame L ev el 0 and, ag ain, at Game L ev el 5 Model Information F actor Information Model P opulation Game Le v el M SE F actor 1 plann-ef fic F actor 2 pile-mmt F actor 3 zoid-cntrl Fa ct o r 4 pile-unif Fa ct o r 5 min-l-clears Fa ct o r 6 rot-crrctns Model 1 B v ersus I 0 0.194 − 0.225 *** – − 0.233 *** 0.076 * 0.124 ** – Model 2 B v ersus I 2 0.148 − 0.277 *** − 0.270 *** − 0.162 *** 0.107 * –– Model 3 I v ersus E 0 0.096 − 0.240 *** – − 0.308 *** 0.256 *** 0.373 *** 0.330 *** Model 4 I v ersus E 5 0.149 – – – 0.151 ** 0.103 0.093 Significance Codes: