Situational Episodic Context Theory: An Extension of the Episodic Context Account of the Testing Effect

Theories to Explain the Testing Effect

Download 0.57 Mb.

Page	3/6
Date	30.07.2022
Size	0.57 Mb.
	#59250

1 2 3 4 5 6

Thesis paper Violet Jun 27-1

1.1 Theories to Explain the Testing Effect
1.1.1 Bifurcation Model
One important model to explain the testing effect is the bifurcation model. The bifurcation model states that information that is successfully retrieved is strengthened over information that is unsuccessfully retrieved (Kornell et al., 2011 [028]; Rowland & DeLosh, 2015 [106]). There is plenty of support for this model (Rowland, 2014), though Karpicke et al. (2014, [001]) argue that the model does not propose a mechanism for the testing effect and point out several limitations for the effect (p. 250). They classify this model as a variant of the strength hypothesis, which proposes that for items to be successfully retrieved, they must exceed a particular retrieval strength threshold, and that due to forgetting (see Ebbinghaus, 1885 [076] and Murre & Dros, 2015 [077]), restudied items that are initially above this threshold will eventually fall below it, while tested items do not fall as far, assuming a single re-exposure event (Karpicke et al., 2014, p. 250 [001]). They rightly point out what Kornell et al. (2011 [028]) also mentioned, that the bifurcation model merely describes the results of the testing effect and does not describe a mechanism to explain the effect. It does not, in their opinion, specify “how or why such [testing effects] would occur” (Karpicke et al., 2014, p. 250 [001]).
1.1.2 Transfer Appropriate Processing
In 1972, Craik and Lockhart proposed a multi-level processing model as an alternative to the then-current multistore model of memory (p. 671 [070]). This alternative model would later form the basis of the Transfer Appropriate Processing (TAP) theory, introduced by Craik and Tulving in 1975. They hypothesized that memory encoded at one level (depth) of processing would be strengthened when recalled later at the same level of processing. They also proposed that the deeper the level of processing while encoding, the stronger the memory trace would be (Craik & Tulving, 1975 [071]). The TAP theory has been used as a primary explanation for the mechanism behind the RPE (Marsh & Butler, 2013 [069]; Endres & Renkl, 2015, p. 2 [034]; Karpicke et al., 2014 [001]; Barenberg et al., 2020 [026]; Halamish & Bjork, 2011, p. 802 [055]).
TAP seems like a natural fit for the testing effect, but it hasn’t been unchallenged. In testing the TAP model, Pickering et al. (2021 [036]) did not even find a testing effect at all, though they did find that both testing and restudy groups did better than single-exposure groups who had no intervention between initial study and the final test. Carpenter and DeLosh (2006 [032]) found results that contradicted the TAP model, more closely matching the elaborative retrieval hypothesis (see section 1.1.5). Karpicke (2017 [014]) claims that while the TAP model makes intuitive sense, it does not constitute a mechanism for the testing effect per se; furthermore, he states that final test performance for groups with recall interventions tends to be better than for groups with recognition interventions, regardless of the format of the final test, which stands in direct contrast to the TAP model (p. 491). He provides solid evidence to back up his claims (pp. 503-505; see also Karpicke et al., 2014, p. 251).
1.1.3 Retrieval Effort Hypothesis
The retrieval effort hypothesis (REH) states that the mental effort involved in retrieving a memory during recall mediates the strength of the encoded memory and the accuracy of subsequent recall (Wang & Zhao, 2019 [073]; Karpicke, 2017 [014]; Karpicke et al., 2014 [001]). Probably the first evidence supporting this theory was presented by Bjork (1975 [029]). His data suggested, inter alia, that recall tests requiring greater mental effort enhance recall to a greater extent than the more “superficial” recall tests, which result in ceiling effects (p. 143). There is mixed support for the REH, but it has been substantiated to some extent by empirical data (Pyc & Rawson, 2009 [030]; Wang & Zhao, 2019 [073]; Kang et al., 2007 [074], Carpenter & DeLosh, 2005 [075]). Interestingly, the results of Kliegl et al. (2019 [031]) also support the REH but show a reversal of the effect when feedback was given immediately after intervening testing (see also section 1.2.2). The REH is related to the elaborative retrieval hypothesis (see section 1.1.5).
De Lima et al. (2020 [116]) found evidence against the REH in two experiments involving word-pairs. Karpicke et al. (2014) also argued against it, but mostly on theoretical, rather than empirical, grounds. They claimed that the REH merely describes phenomena that occur during retrieval practice and does not constitute a mechanism for the effect.
1.1.4 Theory of Disuse and Desirable Difficulties Framework
Based on his earlier work and that of others, Bjork (1992 [033]) proposed the theory of disuse, which, as Karpicke et al. (2014, p. 249 [001]) point out, “[differentiates between] retrieval [and] storage strength.” It is a variation of the strength hypothesis. Storage strength refers to the quality of the memory as it is encoded and retrieval strength refers to the memory’s retrievability. The theory relates to the RPE because both restudy and testing increase both retrieval and storage strengths, but testing moreso (Karpicke et al., 2014 [001]).
Bjork (1994 [112]) coined the term “desirable difficulty” (also known as the desirable difficulties framework or DDF). The DDF is based on findings that when items on intervening tests are neither too easy (resulting in ceiling effects, stemming, in part, from automaticity or familiarity) nor too difficult (resulting in floor effects), retrieval processes are engaged which strengthen memory (Chen et al., 2018 [011]). The DDF agrees well with reports such as those by Carpenter and DeLosh (2006 [032]) that recalling list items with more impoverished cues (e.g., “b________” instead of “bu_ld___” for the word “building”) enhanced recall more than those with richer cues, probably due to increased mental effort and elaboration during retrieval (see section 1.1.5).
I agree with Karpicke et al. (2014 [001]) that both the theory of disuse and the DDF don’t propose a mechanism for the RPE so much as describe its results. The former, in this context, essentially says that testing enhances memory encoding and retrieval, which seems obvious and is the basic premise of what happens as a result of the testing effect. The latter describes some important conditions that seem to be necessary for a RPE, but doesn’t explain how the effect occurs (see section 1.2).
1.1.5 Elaborative Retrieval Hypothesis
Likely the most popular explanation for the testing effect, especially in the last decade, is the elaborative retrieval hypothesis (ERH). This prevailing theory to explain the testing effect has garnered a lot of support (Carpenter & Delosh, 2006 [032]; Endres & Renkl, 2015 [034]; Carpenter, 2009 [035]; Rowland, 2014 [052]), though it has been refuted by some on both logical and empirical grounds (Karpicke et al., 2014). As mentioned above, the ERH relates to the REH in its reliance on mental effort. The basic premise is that cues which prompt greater mental effort and elaborative processing, when answered correctly lead to greater frequency of successful recall of the associated target items (Endres & Renkl, 2015 [034]). Carpenter (2009 [035]) explains that semantic elaboration may occur during retrieval, encouraging spreading activation of associated semantic cues (p. 1564).
Objections to the Elaborative Retrieval Hypothesis
Karpicke et al. (2014 [001]) presented some limitations and contradictions of the ERH, some of which have strong logical grounds while others have weak bases. I will discuss each of these in turn.
The first limitation they present is the observation that much of the data supporting the semantic elaboration account of RPEs (at the time of the article) was correlational in nature and thus not definitive evidence that the elaboration specifically causes the RPE (p. 254). Owing to that limitation, we cannot rule out other factors as causing or influencing the RPE. While this is true, I would point out that this assertion does not negate the possibility that semantic elaboration accounts for the testing effect, it merely does not necessarily affirm it. More experimental research is needed to tease apart the factors that may cloud the associations between elaboration and retrieval success.
The second objection they raise concerns the principle of cue overload (Karpicke et al., 2014, p. 254 [001]). Cue overload is when too many associations with a particular cue cause interference with the true or original association (Watkins & Watkins, 1975 [113]). Karpicke and colleagues (2014 [001]) claim that the very idea of semantic elaboration, of generating more associations for a given cue, directly opposes the data from the cue overload principle. That elaboration produces more cue associations implies the existence of increasing confounds, leading to a cue overload effect; this cue overload would then hamper the testing effect, which has not happened in empirical research (p. 254). This objection is one with a strong basis, though most of the data cited was for word pair research, instead of information with higher-level organization. It is also possible that this objection might be resolved in the spirit of the desirable difficulties framework. More specifically, it might be explained by examining whether the relationship between the number of cue-associations and testing effect results is curvilinear, peaking around some middle point. As with the above possibility, more research is needed to understand and explore this possible relationship.
Third, they object that the ERH contradicts predictions from the concept of retrieval-induced forgetting (RIF), a well-established theory involving competition between associated cues (think back to the bifurcation model; see also Perfect et al., 2004 [078]). The premise is that if elaboration for associated cues were required for a RPE, then RIF should not occur (Karpicke et al., 2014, p. 255), though I believe this objection warrants further investigation.
Finally, they present empirical evidence from studies directly contradicting the ERH prediction that semantic elaboration activities should induce RPEs, noting the absence of RPEs during several different elaboration tasks (pp. 255-256, 271-273).
One counterargument I have against Karpicke et al.’s objections is that many of them are based on evidence using cued recall. It is likely that the studies they presented lacked RPEs because the conditions weren’t optimal (i.e., they used cued instead of free recall) to produce an RPE (see section 1.3.3). Hopefully, meta-analyses which include more recent data will be performed to shed some light on this concern and find a more parsimonious explanation.
1.1.6 Episodic Context Account
As an alternative to all the theories that Karpicke et al. (2014) refuted in their article, they proposed a new theory, one brought over from a massive body of research on episodic context. The word “context” in this case initially referred to the temporal context of the learning experience (i.e. the temporal source memory of learning the information). According to Karpicke et al. (2014), their theory, called the episodic context account (ECA), requires four main assumptions:

During initial encoding, a slowly changing temporal context for the learning experience is established.
When the information is retrieved, the temporal context, if it has changed appreciably (as in spaced recall), is reinstated and used to “guide a search process” (p. 258) of lexical or semantic information that (I assume) is related to the initial episodic context.
Using temporal context during retrieval, the prior temporal context representation is updated to form a new composite temporal context.
This updated context leads to more efficient search processes in future recall attempts.

During restudy, on the other hand, the temporal context is not assumed to be reinstated, or at least not as strongly, because neither the temporal context nor the semantic information is necessarily reconstructed by the individual (because there is no need to mentally reconstruct what is already before oneself). While not explicitly stated by Karpicke and colleagues, it seems reasonable that this connection between multiple temporal contexts and the related target lexical/semantic information should also extend to spatial and procedural contexts, and this idea will be explored further in this paper.
Given that this theory (as applied to the testing effect) is relatively new (within the last 10 years), not much evidence exists for or against it. Most recently, Schwoebel et al. (2021 [062]) presented lists of words to participants along with either imagined or environmental scenes in three separate experiments and found strong evidence that a robust RPE occurred not only on the basis of temporal episodic contexts, but also environmental episodic contexts. Whiffen and Karpicke (2017 [009]) also had participants study several lists of words in three separate experiments and either restudy the words or discriminate between which list each word came from. The results showed that participants who performed the temporal judgment (discrimination) task performed much better than the restudy group on the final test, consistent with the ECA. Most interestingly, Ma et al. (2020 [056]) found evidence for a episodic context-dependent RPE for third graders and college students but not for first graders, consistent with neurodevelopmental studies that state that children’s episodic memory abilities are not well-developed for the first-grade age group but are much more so by the third grade.
Hong et al. (2019 [072]) tested individuals on word lists while priming them with different font colors and they did not find a testing effect for contextual details (i.e., font color), contradicting an assumption of the ECA. However, the initial model of the ECA relied mainly on temporal context, and font color is only one detail of context that might not be as crucial or significant for memory retrieval. Another recent study found a testing effect for “low-constrained” word pairs for item recognition but, interestingly, not for contextual details, including font color and list discrimination (i.e., which items came from which list)(Giannakopoulos et al., 2021 [085]). It may very well be that only specific kinds of contextual information will facilitate a stronger testing effect. This very extension of the ECA (testing effects for different contextual details) will be explored and tested further in the present study.
1.1.7 Other Theories
Mediator Shift Hypothesis
Several other theories for the RPE are worth discussing, and one of these is the mediator shift hypothesis (MSH). The MSH favors the method of restudying after testing (test-restudy procedure) with feedback, such that the feedback from the test mediates the focus of the subsequent study to the misremembered items on the test (Pyc & Rawson, 2011 [057]; Sotola & Crede, 2020 [053]). The test-restudy-retest format and feedback after initial testing seems to be necessary for this explanation of the testing effect to remain plausible. Pyc and Rawson (2011) taught participants word pairs and had them either test then restudy or restudy only. Their results confirmed the predictions of the MSH, with participants in the test-restudy group showing improvements on the final test for keywords previously failed on the initial test before restudy and participants in the test-restudy group showing greater improvements and higher final scores than the restudy only group.
The MSH relies on differential encoding during restudy for those who tested previously (with feedback) and on the assumption that individuals (consciously or not) will shift their focus from the things they answered correctly to the things they did not. In this way, the MSH is similar in concept to the Attenuated Error Correction Theory (see subsection “Attenuated Error Correction Theory”).
Unfortunately, not many studies on the mediator shift hypothesis exist beyond the two cited here. Some prior research casts doubt on the sufficiency of this theory to explain the RPE, as there have been instances recorded where testing effects were shown even in the absence of feedback (CITATIONS). More research is necessary to further analyze and assess the viability of this theory to explain the testing effect.
Mediator Effectiveness Hypothesis
Closely related to (but distinct from) the MSH is the mediator effectiveness hypothesis (MEH). The MEH requires cued mediators to induce a testing effect. For example, if participants studied the word pair “water” and “volleyball,” they would also study a mediator, “beach.” In that example, the mediator is supposedly used during retrieval to recall the target, (e.g., both volleyball and water are things commonly associated with beaches, so “beach” serves to bridge the gap between “water” and “volleyball”).
There is some empirical support for this theory. Pyc and Rawson (2010 [113]) found high recall rates for both mediators and targets when participants were presented with cues on the final test, a result replicated by Camerer et al. (n.d. [083]). Carpenter (2011 [058]) found strong support for the MEH, when studying cue-target pairs with the mediators also shown during initial testing. Coppens et al. (2016 [081]) replicated Carpenter’s results both online and in person but to a lesser extent.
Several studies on this theory have found null results. For example, in a master’s thesis, DiMarco (2021 [082]) found no testing effect for mediated word pairs in one experiment and even a reverse testing effect in another. Their paper suggests some limitations to the MEH. Leggett and Burt (2020 [079]) also found evidence against the hypothesis from three experiments, though it should be noted that they did not distinguish between the mediator shift hypothesis and the mediator effectiveness hypothesis, simply opting for the phrase “mediation hypothesis.”
Dual Memory Theory
In 2018, Rickard and Pan performed 10 different experiments supporting a new theory to explain the testing effect, the dual memory theory (DMT). This theory states that the testing effect results from processes involving two different types of memory, namely study and test memory. According to this theory, during initial study, a study memory is created. Then, during testing, the restudy memory is strenthened and a new test memory is created and associated with the study memory. The theory features a shift from study to test memory with repeated retrieval practice. Furthermore, Rickard & Pan, in the same study, not only described the model conceptually, but also quantitatively, using mixed power functions for response times (2018 [046]), something not shown for any of the theories described thus far.
Rickard later expanded the theory to explain assumptions arising from individual differences (2020 [084]), and then clarified further that individual differences do not affect benefits from RPEs in another study with Pan and Gupta (Gupta et al., 2021 [068]). While other researchers have mentioned the theory, none have, at the time of this writing, sought to test it.
Lastly, it is important to note the distinction between the dual memory theory of the testing effect, the dual process theory of memory, and the dual theory of memory. The first relates only to the testing effect and differentiates between study and test memory. The second is related (but not limited in scope) to the fuzzy-trace theory of memory or “gist-trace” processing theory, which states that individuals first encode the details of an event in memory which then slowly shifts to a gist memory with broader, overarching features of the memory (Bouwmeester & Verkoeijen, 2011 [115]). The last theory, the dual theory of memory, is the idea that memory is divided into two types, long- and short-term memory and is outside the scope of this paper. Suffice it to say that it stands distinct from the dual memory theory of the testing effect described here.
Attenuated Error Correction Theory
The last theory discussed in this paper is the only other model with a quantitative explanation (Rickard & Pan, 2018 [046]), called Attenuation Error Correction Theory (AECT). Proposed by Mozer et al. (2004 [086]), it is a mathematically-based theory within the neural networks framework (Anirudh, 2019) arising from work by Carrier and Pashler (1992 [124]). There is mixed evidence for this model. Kang et al. (2011 [092]), across three experiments, found that prior incorrect guesses had no effect on later successful recall. Kornell and Metcalf (2014 REFERENCE) found no negative effects of errors on later testing but did not examine whether it produced benefits either. It is unclear how the AECT differs from the MSH, and there does not seem to be any other articles about the theory.
Now that the testing effect theories relevant to SEC theory have been summarized, the different factors which may or may not affect the retrieval practice effect and the conditions which seem to be necessary for it to occur will be discussed. There are a handful of additional theories which were not discussed here that are not relevant to the SEC theory. For a list of these, see appendix B.

Download 0.57 Mb.

Share with your friends:

1 2 3 4 5 6