Y11W05RC Testing as learning, not measurement

This week’s reading explains why testing is more effective than re-studying for long-term learning.


Stage 1 of 4

Prior knowledge activation

  • How do you currently prepare for exams? Do you re-read notes or test yourself?
  • Why might testing yourself feel harder than re-reading if you want to remember something?
  • What’s the difference between feeling confident during study and actually being able to recall later?

Stage 2 of 4

Purpose-setting statement

This article explains why testing is more effective than re-studying for long-term learning. The 2006 Roediger & Karpicke study showed testing beats repeated reading by a huge margin. You’ll learn about the retrieval practice effect, why this surprises people, and how study habits fail to match what research shows works.


Stage 3 of 4

Prediction or discussion prompt

Tension

If testing helps learning, why do most students still prepare by re-reading?

Revisit

Compare students’ actual study strategies with what the research recommends.


Stage 4 of 4

A question to carry into the reading

Notice how this article contrasts intuitive study methods (re-reading feels productive) with effective ones (retrieval practice). The gap between what feels right and what works is central to the article.


Now read

Testing as learning, not measurement

~12 min read · ~1,900 words

Ask a typical student how they prepare for an exam, and you’ll hear something like this. I re-read my notes. I highlight the important parts. I go over the textbook chapters. Maybe I make a summary sheet.

Ask a top student the same question, and the answer is often surprisingly different. I close the book and try to remember what was in it. I explain the concept out loud as if teaching someone else. I do practice problems without looking at the solutions first. I write out everything I can recall from the topic from memory, then check what I missed.

These two approaches look, on the surface, like different styles. They’re not. They’re different categories of activity, and one of them produces reliable learning while the other largely doesn’t. The research showing this is as robust as anything in cognitive psychology, and yet it’s still missing from how most people actually study — including people who teach for a living.

The 2008 study that made the point

The most cited demonstration of this effect came from a 2008 paper by two cognitive psychologists, Jeffrey Karpicke and Henry Roediger, at Washington University in St Louis. Their design was elegantly simple.

Students were asked to learn passages of text — for example, a passage about the sea otter. One group was asked to study the passage, then re-study it, then re-study it again, then re-study it a fourth time. Another group was asked to study the passage once, then take a test on it (writing out what they remembered), then take another test, then another. Both groups ended up spending the same total time on the material. The difference was whether the second, third, and fourth encounters were re-readings or retrievals from memory.

When tested immediately afterwards, both groups performed roughly equally. No surprises there.

When tested a week later, however, the results were striking. The re-reading group had forgotten most of what they’d studied. The testing group retained more than twice as much. The students who had spent their study time trying to remember the material — often awkwardly, often getting things wrong, often feeling they weren’t learning — had built significantly more durable knowledge than the students who had spent their time calmly re-reading.

Karpicke and Roediger called this the testing effect, or sometimes retrieval practice. The name is slightly misleading, because the key operation isn’t testing in any formal sense — it’s the act of trying to bring information back from memory, however you structure that act. Writing out what you remember. Teaching the material to someone else. Answering questions without looking. Doing practice problems without first reading the solution. All of these are forms of retrieval.

What the 2008 paper established, and what hundreds of follow-up studies have confirmed, is that retrieval doesn’t just measure learning. It produces learning. The act of reaching for a memory strengthens that memory in a way that simply re-exposing yourself to the same information does not. A test isn’t an assessment that happens at the end; it’s an activity that happens during, and it’s one of the most reliable learning activities we know about.

Why it works

The mechanism, as best as current cognitive science can tell, has to do with what happens in memory when you try to retrieve something.

Re-reading a passage is pleasant, because the material becomes increasingly fluent. It feels less and less effortful each time. This fluency, Robert Bjork has pointed out elsewhere, is easily mistaken for understanding. But the fluency is about your ability to recognise the material when it’s in front of you — not your ability to recall it when it isn’t. The two are different, and only the second is what you usually need on the exam, in the work presentation, or in the argument you want to win years later.

Retrieval, by contrast, is effortful. Trying to remember something you can’t quite reach is uncomfortable. But that discomfort is exactly what produces the learning. When you successfully retrieve a memory after effort, the memory trace is strengthened in a specific way. When you fail to retrieve it — which is productive too, if you then look up the answer — the gap in your knowledge becomes salient, and the subsequent encoding is more durable. Either way, you leave the session with knowledge that has been tested, not merely displayed.

The educational psychologist Robert Bjork, whose broader work on what he calls desirable difficulties has been influential here, uses a striking analogy. Learning, he suggests, is like a muscle. Passive exposure to material is like watching someone else work out — you can do it for hours and see no change in your own strength. Retrieval is like actually lifting the weight yourself. It’s harder, and that’s why it works.

Beyond the laboratory

The testing effect has been extended in many directions, and the findings have held up with unusual consistency — which is rare in education research, where many findings fail to replicate when moved out of their original setting.

Among the extensions worth knowing about:

Retrieval works even without feedback. You don’t need a teacher or a marker telling you what you got right and wrong for the retrieval itself to produce learning. Simply trying to remember, on your own, is most of the benefit. Feedback adds to it, particularly for correcting errors, but the retrieval itself is the core mechanism.

Short, frequent retrieval beats long, infrequent retrieval. A five-minute recall exercise at the start of each study session, repeated across many sessions, produces better retention than a single long testing session at the end. The reason has to do with the interaction between retrieval and spacing, which is itself a separate research finding.

Retrieval works across subjects. The effect was first demonstrated with text passages, but it has been replicated for foreign-language vocabulary, historical facts, scientific concepts, motor skills, map reading, surgical techniques, and aeroplane pilot training. It appears to be a general property of how human memory works, not a narrow feature of verbal learning.

The feeling of learning and actual learning are often inverted. This is one of the most unsettling findings. Students, when asked which study technique feels more effective, overwhelmingly choose re-reading. The re-reading feels productive — it goes smoothly, the material seems to be sinking in, the studying feels successful. Retrieval, by contrast, feels unproductive. You keep getting things wrong; the material feels harder than it did a minute ago; you feel like you haven’t made progress. And yet retrieval is what’s producing the actual learning. Your subjective sense of what’s working is, in this domain, almost the opposite of what’s actually working.

The anxiety caveat

One important caveat: the research on the testing effect is about low-stakes retrieval — you alone, or in a small group, trying to remember what you’ve studied. It is not an endorsement of high-stakes testing as a learning tool. When students feel that their grades, their future, or their self-image are on the line, the experience of being tested can produce anxiety that disrupts memory retrieval and interferes with learning.

The fact that retrieval practice helps learning doesn’t mean that more exams are the answer. It means that the activity of retrieval — done in a low-pressure setting, with a forgiving attitude toward the inevitable mistakes — is the activity that teaches. The ideal practice looks less like formal testing and more like intellectual play: trying to remember something, getting most of it, missing parts, looking those parts up, trying again tomorrow.

This has practical implications for how schools and universities are organised. The research suggests that frequent low-stakes quizzing is a powerful tool when the stakes are genuinely low. It suggests that the traditional pattern of a single high-stakes exam at the end of a course is roughly the opposite of what the evidence supports. It also suggests that some of the alternatives currently popular — more group projects, more open-book assessments, more continuous-assessment portfolios — may miss the specific benefit that regular, brief, low-pressure retrieval offers.

The meta-analyses

For most educational interventions, the underlying research is thin. Someone publishes an interesting finding, a few studies replicate it in favourable conditions, and then it either fades or survives depending more on marketing than on evidence.

Retrieval practice is unusual in that the finding has been extensively meta-analysed — summarised across many independent studies — and the effect has consistently held up. The educational psychologists John Dunlosky, Katherine Rawson, Elizabeth Marsh, and colleagues, in a major review, ranked retrieval practice as one of the two most effective study techniques (along with spaced practice, covered in a separate article). Most of the techniques students and teachers believe in — highlighting, summarising, re-reading, matching instruction to “learning styles” — fared significantly worse in their evaluation.

So the evidence isn’t based on a single paper or a single research group. It’s based on a broad, varied, replicated literature produced over decades by researchers with no particular axe to grind. This is one of the best-supported findings in educational psychology.

What to actually do

The practical implication is simple, and most students can start applying it immediately.

When you want to learn something, don’t just re-read your notes. Close them. Try to remember what was in them. Write out everything you can recall. Then check what you missed. Fill in the gaps. Tomorrow, do it again.

When you’re preparing for an exam, reduce the time you spend re-reading the material, and increase the time you spend trying to remember it. Practice problems are better than re-reading the textbook. Explaining the concept out loud, as if to a stranger, is better than reviewing the diagram. Making flashcards and actually using them is better than making summary sheets you never look at.

When you’re reading a book you want to remember, pause at the end of each chapter and try to reconstruct its key points from memory before turning the page. This single habit will dramatically improve what you retain.

When you’re teaching someone something, don’t just explain it to them — have them try to explain it back to you without looking at their notes. Their struggling to remember is where their learning is actually happening.

None of this is glamorous. None of it feels particularly intellectual. It feels, most of the time, like slightly awkward remembering-out-loud with a pen nearby. But this, the research is absolutely clear, is what produces knowledge that lasts.

The question that remains

Perhaps the deepest thing the retrieval-practice research teaches is a reframe about what reading, studying and learning actually are. Most people, most of the time, treat learning as an act of reception — material comes in, and if it comes in often enough, it lodges. The research suggests learning is actually an act of production — material only lodges when the mind has actively produced it from its own resources, tested itself against it, and repaired what it got wrong. You don’t store a passage by reading it. You store it by trying to recover it.

The question worth holding, for the next time you think you’re studying something:

When you closed the book, could you tell someone what was in it — and if not, are you actually learning, or just recognising that you’ve been there?

Key research referenced: Jeffrey Karpicke and Henry Roediger’s 2008 Science paper on the testing effect; John Dunlosky, Katherine Rawson, and colleagues’ 2013 review of study techniques; Robert Bjork’s research on desirable difficulties.