2.2Situational Context for Novel "On-the-Go" Learning Experiences One study found a testing effect for individuals who took tests displayed on a video screen with different environments as a backdrop, supporting (the extended) ECA, though that study was published before Karpicke et al. proposed their theory (Smith & Manzano, 2010 ). Indeed, doing so may be cheaper than varying live testing locations but may also limit the implications of the theory. My new theory proposes that a composite episodic context that includes, temporal, visuospatial, emotional, procedural, and other contexts about the learning and testing experiences and eventually becomes filtered out once a strong retrieval of the information itself becomes possible (likely through myelination of the associated neural areas). This implies that to show a strong testing effect, individuals should have a unique and novel testing experience. 3 Methodology In this experiment, I will attempt to both demonstrate a testing effect in general and show a differential testing effect for high context-dependence over low context-dependence learning experiences. In this section, I will first discuss the research design, including the research philosophy, type, timeline, strategy, sampling techniques, and data collection methods that will be used. Then I will outline the procedures, after which I will include the methodological limitations of this study. Finally, I will conclude with a brief summary of my methodology.
3.1 Research Design (NEEDS HEAVY REVISION)
This experiment will use a positivist, epistemological approach because I am proposing a theory and attempting to find evidence to either support or refute it. However, because this topic is so complex with so many different factors to control for, only limited conclusions can be drawn from this study (see section 3.2). The purpose of this research, then is to provide a basis for other researchers to do more specific, careful, and varied experiments to support (or refute) the SEC theory. Teasing out the underlying mechanisms and conditions necessary for the RPE is a great undertaking and realistically requires extensive reviews and meta-analyses. Rowland (2014) is one such analysis, but it is relatively dated. The biggest issue with using pattern and parsimony here, as mentioned earlier, is that the methodologies in the different published studies on the RPE vary in so many different ways. In that same vein, it is important to identify the intended applications of a study on the RPE, so that analyses and reviews can have greater explanatory power. This is why I have chosen to test a new theory (really a modification of an existing theory); my aim is to take a theory that is already proposed (the ECA) and support and extend it using factors that are harmonious with findings from other studies on the testing effect. Hopefully, the SEC theory will give a new perspective on the ECA, using it to explain additional observations.
The most definitive type of testing effect research is a quantitative experiment. This is because the very idea behind the RPE is that those who use practice tests in between an initial study and a final test have better scores (a quantitative variable) on the final test than those who use a restudy intervention. Therefore, it is highly logical to use a design that has two experimental groups (test vs. restudy) and to measure and compare their final test scores. Qualitative research on this topic is possible; researchers could have participants learn a concept and then later teach the same concept, but this methodology is really hard to measure and has poor construct validity. Testing effect research can be exploratory, deductive or inductive, but this particular experiment is deductive. I am starting with a theory, then I am creating carefully controlled conditions to test that theory.
The experiment will take place over the course of 1 week. There will be an initial study session, followed by two intervention sessions, and then a final test session, with sessions occurring every other day. This aligns with the current consensus on spaced repetition (see section 1.2.1). Test scores will be measured for each testing group and for everyone on the final test. Participants will be undergraduate and graduate students recruited by instructors from randomly selected classes from the University of Tennessee, Knoxville.
This experiment is in a 2 2 design, with four cells: testing on a bus, restudy on a bus, testing in a classroom, and restudy in a classroom. All groups will first study the content on a computer screen in a classroon. All groups will perform a brief distractor task with basic mathematical operations (multiplication and long division) and then review the content. On day 3, the participants will reenter the intervention area for their first intervention. The two bus groups will wait for a bus at a bus station and enter the next bus that arrives. All information and testing will be encountered using Qualtrics. Each participant will encounter their material separately. An observer will accompany the participant in disguise for the two bus groups to ensure that distractions do not significantly alter the interventions.
The bus testing group will carry a tablet provided by the researcher onto the bus. They will then answer responses to short-answer questions about the content they read on day 1 on their tablet. Questions will be randomized.
The bus restudy group will re-read the story they read on day 1 on their tablet as they travel on the bus.
The classroom testing group will be given the same test as the first group on a stationary tablet.
The classroom restudy group will re-read the story they read on day 1 on a stationary tablet.
All four groups will repeat this same procedure on day 5, with the bus groups taking a different bus route. On day 7, all participants will take a final test on the computer using Qualtrics. The test will be similar to each of the other tests. Questions will also be randomized.
Data Preparation and Analysis
Once all data is collected, each participant’s data will be coded with a number and their personally identifiable information (e.g., name, email, etc.) will be removed from the dataset. Once that is complete, an independent coder (not the researcher) will manually score all tests, which will be reordered with questions in the same order for each test, via an answer key. Either a response will be marked correct (minor spelling errors and typos will be ignored) or it will be marked incorrect; no partial points will be given. Incomplete entries (e.g., a participant came to only one intervention or didn’t show up for the final test) will be removed from the main dataset, as a partial dataset is mostly useless for our purposes.
All data will be analyzed using SPSS. First, the score for participants in each group will be added separately and any outliers will be detected using the Tietjen-Moore test and removed. Second, the normality of the data will be assessed using the Shapiro-Wilk test of normality. Homogeneity of variances will be assessed using Levene’s test. A two-way ANOVA will be performed using SSPS to detect any potential interactions between intervention environment and intervention type. A Tukey post-hoc test will be performed to see where the interactions, if any, are and calculate the main effects. Lastly, the raw data will be more closely analyzed to detect any possible systematic variance in the responses within groups.
Participants will be awarded extra credit in their course commensurate with their participation in the experiment. As an alternative to participation, students will be offered the opportunity to earn the same amount of extra credit by completing some assignment of similar effort of the instructor's choosing, such as a paper or a presentation.
This experiment obviously has several limitations.
Due to time constraints, differential effects of feedback could not be explored, however, as previously mentioned, feedback is not necessary for a RPE, it just enhances it.
Probably the most important limitation in this experiment is due to the nature of the study content. Short story testing effects are (probably) less generalizable than word-lists because each story has a different level of detail, depth, and complexity. Thus, some caution is advised when interpreting the results of this experiment. Due to time and budget constraints, I was not able to compare the applicability of the SEC theory between word lists and stories. However, that topic is recommended for further research.
Because participants were selected from the University of Tennessee, Knoxville, the generalizability of the results is even further limited. However, it can reasonably be assumed that the sample should still be representative of university students in the United States, though this assumption should be made carefully, of course.
It is possible that scores between groups may have varied because of either unsystematic or systematic variance. At the end of data analysis, questions and demographics were analyzed more closely to detect both systematic and unsystematic variance and (FINISH) was found.
Obviously, because the theory of SEC, while based of previous theories, is new, there is no existing literature on it. Therefore, this experiment should be considered a starting point for discussing and investigating the theory.
3.3 Summary and Predictions
To summarize, this study can be classified as: positivist, epistemological, deductive, quantitative, experimental, multivariate, non-randomly sampled, and randomly-assigned. The goal of this study is to demonstrate a testing effect and to provide evidence in favor of the newly-proposed SEC theory of the testing effect. The study employs a 2 × 2 design (test vs. restudy and bus vs. classroom) with one dependent variable (final test scores). The experiment will consist of four sessions (initial study, test/restudy, test/restudy, and final test), with the intermediate sessions occurring either on the bus (with tablets connected to Qualtrics) or in a classroom (with identical tablets). All responses will be short-answer and will be scored manually by an independent coder, and final scores between groups will then be compared and analyzed. Outliers will be removed, normality and heterogeneity of variances will be established, a two-way ANOVA and Tukey post-hoc test will be performed using SPSS, and responses will be further analyzed to detect unsystematic and systematic variances in the responses. Finally, the results of this study should be seen in light of several limitations including, but not limited to, insufficient sample size, reduced comparability with other (word-list) studies, no prior SEC theory literature, and possible lack of generalizability.
I predict that both testing groups will score higher than the restudy groups, and that the bus groups will perform significantly better than their classroom counterparts. The first prediction would support the testing effect for short stories and the second would support the situational episodic context theory of the testing effect. Therefore, I predict a significant main effect for both independent variables, but no significant interactions. In the event that the experiment results in a null effect for either prediction, it would not necessarily preclude the accuracy of the RPE or the SEC theory, but it would not support them either. In any case, more research on the topics is definitely warranted.