Situational Episodic Context Theory: An Extension of the Episodic Context Account of the Testing Effect

Factors Affecting and Conditions Necessary for the Testing Effect

Download 0.57 Mb.
Size0.57 Mb.
1   2   3   4   5   6
Thesis paper Violet Jun 27-1
1.2 Factors Affecting and Conditions Necessary for the Testing Effect
1.2.1 Spacing
Much research shows that spacing repeated retrieval practice intermittently over a long time period (anywhere from a few minutes to several months) is more beneficial than massed retrieval (one practice session followed by a final test) for improving recall (Karpicke, 2017 [014]; Carpenter & DeLosh, 2005 [075]; Karpicke & Roediger, 2007 [096]; Roediger & Butler, 2010 [051]; Roediger & Karpicke, 2010 [097]; Soderstrom et al., 2016 [098]). This intuitive finding agrees with much current and former research on retrieval practice. While there is substantial evidence for the benefits of spacing retrieval, one should note that studies differ on what constitutes “spaced retrieval.” While all accounts agree that there should be some length of time between practice sessions, there is no consensus on how long that time should be. This is particularly important for memory because memories are often consolidated over time and most especially during sleep, and, thus, methodologies that do not have enough time for sufficient consolidation into long term memory may show inaccurate, incomplete, or incomparable results compared to more long-term studies. I postulate that memory consolidation is what makes spaced retrieval practice effective. Sufficient spacing, then, should ensure that retrieval practice accesses long-term memory and not short-term or working memory, while too much spacing (length) should reduce accuracy because of forgetting and memory distortions over time. There seems to be a “sweet spot” for number of repetitions (practice sessions) for improvement of recall, as eventually, retrieval becomes familiar or automatic (Pickering et al., 2021 [036]). Of course, in the context of learning and memory, familiarity and automaticity can be beneficial, but it would likely cloud the beneficial effects of repeated testing.
The theoretical basis for the efficacy of spaced retrieval practice comes from research about the effects of recall on forgetting. As noted in an earlier section, Ebbinghaus’ famous “forgetting curve” (Ebbinghaus, 1885 [076]; see also Murre & Dros, 2015 [077]) explains how memory fades, but many studies have shown that repeated retrieval practice significantly reduces the effects of forgetting and lowers the “steepness” of the forgetting curve (Chun & Heo, 2018 [100]; Mizumoto, 2014, p. 30 [099]).
Some authors (CITATIONS) have reported that an expanded spacing design (where each interval between practice sessions becomes increasingly longer) leads to higher recall than a uniform spacing design (where each interval is equidistant), though Carpenter and DeLosh (2005 [075]) found expanded repetition to be no more beneficial than uniform repetition in a study on recall of face-name pairs. Karpicke and Roediger (2007 [096]) found similar results for word pairs, though Wang and Zhao (2019 [073]) wisely point out that both of these studies, especially the latter, have relatively short time intervals. A possible explanation for these results is that even though one could argue that the increasing interval length corresponds with a systematic increase in difficulty (see section 1.2.4 on the DDF), it is also likely that the relative memory strength also increases with each repetition, so the increases in difficulty “level out.” The basis for using expanded spacing probably stems from the logarithmic nature of the forgetting curve and how it is affected by retrieval practice (see Ebbinghaus, 1885).
1.2.2 Feedback
The effects of feedback on the RPE and on item retention are well-researched, and the consensus seems to be that corrective feedback further increases retention (Roediger & Butler, 2010 [051]; Agarwal et al., 2016 [021]); Rowland, 2014 [052]). Some studies have noted a dichotomous relationship for the effectiveness of two pairs of feedback types: correct/incorrect and immediate/delayed.
Correct vs. Incorrect
Pashler et al. (2004 [087]) found that providing feedback after incorrect answers (corrective feedback) for Luanda-English word pairs led to a a massive 494% increase in recall on the final test. They also found that feedback for correct answers did not affect performance. They mention (in Pashler et al., 2007 [088]) that the 2004 study only found feedback effects when the correct answer was given during feedback (i.e., not simply told whether their answer was correct or not). Similarly, Kang et al. (2007 [074]) found a strong positive effect for corrective feedback on a short-answer (SA) test (though not for multiple-choice (MC)) relative to tests with no feedback provided.
Immediate vs. Delayed (and Desirably Difficult vs. Undesirably Difficult)
Although most studies agree that corrective feedback is beneficial, one study found that feedback when items were not successfully retrieved (particularly when they were not challenging VERIFY) actually reversed the testing effect (Racksmány et al., 2020 [039]). The testing effect was also reversed in another study when feedback was given immediately, suggesting that delayed feedback is more beneficial than immediate feedback (Kliegl et al., 2019 [031]). This could be explained by the idea that presenting immediate corrective feedback serves as a restudy attempt, whereas feedback after a delay would require retrieval of the information from long-term memory storage rather than from working memory.
Rickard and Pan’s (2018) dual memory theory relies, in part, on corrective feedback; according to that theory, providing the correct answer during feedback strengthens both study and test memory in a similar fashion as achieving the correct answer initially. The benefit of corrective feedback also provides evidence for the bifurcation model, as information that is incorrectly retrieved is strengthened along with the successfully retrieved information, instead of remaining the same or getting worse (Kornell et al., 2011 [028]).
Ultimately, feedback may not be necessary for the RPE, but under certain circumstances, it can definitely benefit or strengthen the effect.
1.2.3 Test Format
While not the most influential factor affecting the RPE, test format is still very important. There are many different types of testing formats and designs, though here I will divide them into two main categories: experimental design and question format.
Experimental Design
Because of the nature of the spacing effect, there can be many different experimental designs for a study on the testing effect. For example, one can vary the procedural setup (e.g., study-test-test (STT), STST, SSTT, etc.) or choose within- or between-subjects designs. This makes it somewhat challenging to compare results across different studies. By far, though, the most common (simple and efficient) RPE study design is a between-subjects STT (versus SST for the control condition) format (de Lima et al., 2020 [116]; Halamish & Bjork, 2011 [055]; Kornell et al., 2011 [028]; Pyc & Rawson, 2011 [057]). Of course, longer and more rigorous research designs are less efficient and more likely to suffer from under-recruiting and attrition, but can reveal more information about the RPE and provide evidence for different theories to explain the effect. Interestingly, Abel and Roediger (2017 [050]) found that mixing the test and restudy conditions did not reduce the strength of the testing effect. There’s also something to be said for test questions that change or are similar over multiple retrieval practice sessions, with similar tests producing greater RPEs (Carpenter & Delosh, 2006 [032]) . It may be the case that when confronted with the same question over multiple sessions, individuals will automatically “spit out” the correct answer rather than retrieving it from long-term storage (Craik & Watkins, 1973 [123]; Horzyk et al., 2017 [007]). Therefore, it might be advantageous to slightly rephrase the questions on future interventions just enough so that they don’t contain the original wording or order as the initial intervention, but are similar enough to evoke the same associated mental representations as the original question.
Question Format
The type of question on the tests is also important. For example, research has shown that recognition questions (i.e. Multiple-choice) sometimes creates a RPE (Hulme & Rodd, 2021 [023]). For example, Little et al. (2012 [122]) found that multiple choice questions actually benefited participants more than cued recall by bringing to mind information about incorrect alternative answers. More often, though, RPEs for MC questions are small and insignificant, as cued and then free recall (both as short-answer questions) tend to have the strongest effect (Butler & Roediger, 2007 [121]). This fits with the ECA mentioned earlier; according to that theory, recognition would show the weakest effect of the three types because it requires much less reconstruction of prior temporal contexts. Of course, occasionally, temporal contexts can be reconstructed during recognition, but it happens much less often because the information encountered previously is again before the individual. Likewise, cued recall would show a weaker effect than free recall because free recall requires a full reconstruction of the prior temporal context(s) (see Karpicke et al., 2014). It is well established in both experimental research, reviews, and meta-analyses, that free recall usually leads to the strongest RPEs, followed by cued recall, and finally recognition (Karpicke, 2017 [014]). Still, I feel that future studies should examine the quality of questions as well as their format.
Regardless of the type of question, though, there are some criteria that must be met to show a significant RPE.
1.2.4 (Intervening) Retrieval Accuracy and Item Difficulty
The bifurcation model of the testing effect would have the successfully retrieved items strengthened while the unsuccessfully retrieved items are weakened or left alone (Kornell et al., 2011 [028]; Kubic et al., 2018 [104]). However, some studies suggest, and indeed, reason would have it, that retrieval of incorrect items would also be strengthened by the testing effect, (reinforcing a false answer)(Hopper & Huber, 2019 [017]). However, corrective feedback, as mentioned earlier, has been shown to mitigate these issues (Karpicke, 2014), sometimes when given immediately (Pashler et al., 2005 [087]) and sometimes after a delay (Pashler et al., 2007 [088]; Karpicke & Roediger, 2007 [096]). Findings seems to be consistent that successfully retrieved items are strengthened by the testing effect and items retrieved incorrectly or not at all are usually not strengthened, though they can be with corrective feedback (Karpicke, 2017; Karpicke et al., 2014; Kornell et al., 2011).
Studies assessing differential testing effects for item difficulty have mixed results. For example, de Lima et al. (2020 [116]) found a RPE for easy over difficult items in one experiment but then found a RPE for difficult over easy items in another. Minear et al. (2020 [002]) reported that differential effects of item difficulty were mediated by individual differences on fluid intelligence (gF), with individuals higher in gF scoring better on difficult questions and vise versa (see next section). More could be said of item difficulty, though it is irrelevant to the question this thesis addresses.
1.2.5 Individual Differences and Other Factors
Testing seems to be beneficial for most (perhaps all) individuals regardless of individual differences including working memory capacity and personality traits (Bertilsson et al., 2021 [013]), though there are exceptions (as in Minear et al., 2020 [002]). While research on this specific topic is not plentiful, preliminary findings seem to indicate that the RPE is not limited to specific groups of people. For example, Guran et al. (2020) found a robust testing effect over a seven day period, regardless of age, though the effect was slightly weaker (yet still significant) for older individuals. Interestingly, though, Bouwmeester and Verkoeijen (2011) found differential magnitudes for the RPE in children depending on gist processing abilities, which provides evidence for the fuzzy-trace theory. The theoretical implications of this finding, however, are outside the scope of this paper. Wong and Lim suggested that mind-wandering mediates the RPE, with greater mind-wandering present in individuals with smaller RPEs. Interestingly, they also found that mind-wandering was reduced in the study condition only when subsequent study sessions were done using a “new” mode (2021). This suggests that attention is important for successful retrieval practice, consistent with the REH.
Now that I have discussed the existing testing effect literature, the theories to explain it, and the factors and conditions involved, I will begin developing my hypothesis on the varied contextual nature of the episodic context account and my methodology to test it.

Download 0.57 Mb.

Share with your friends:
1   2   3   4   5   6

The database is protected by copyright © 2022
send message

    Main page