You may have heard the story already. A child is sat at a table in front of a single marshmallow. An adult tells them: You can eat it now, or, if you can wait fifteen minutes until I come back, you can have two. The adult leaves the room. A hidden camera watches what happens next.
Some children grab the marshmallow almost before the door closes. Some stare at it, sniff it, touch it with one finger, put it back. A few wait the full fifteen minutes and get their two marshmallows. And then, according to the story, the researcher follows the children for decades and discovers something startling: the waiters end up with higher academic scores, better careers, healthier relationships, lower rates of addiction. The ability to delay gratification at four, the story says, predicts most of what matters in adult life.
It’s a beautiful story. It has launched a thousand parenting articles, a dozen bestselling books, and a whole industry of “willpower training” for children. The only problem is that, in the form most people have heard it, it isn’t quite true.
What the original experiments actually showed
The marshmallow test was designed in the late 1960s and early 1970s by a Stanford psychologist named Walter Mischel. He ran a series of experiments at the Bing Nursery School, a preschool for children of Stanford faculty and graduate students. Mischel was genuinely interested in self-regulation — how children manage impulses, cope with temptation, direct their own attention.
The original findings were real. Children who waited longer did, on average, show some better outcomes years later: modestly higher SAT scores, slightly better social skills. But the sample was small, the effect sizes were moderate, and the population studied — children of Stanford academics in the 1970s — was about as unrepresentative of humanity as you can get. Mischel himself, to his credit, was careful about these caveats throughout his career. The simplification into wait-longer-equals-win-at-life was not really his doing; it came from the popular retellings that turned a nuanced finding into a parable.
The replication that changed the picture
In 2018, a team led by Tyler Watts at New York University, with Greg Duncan and Haonan Quan, did something the original research hadn’t: they ran a much larger version, with a more diverse group of more than 900 children, and they controlled for family background.
Their finding was quiet but important. The simple correlation between waiting time and later outcomes was still there — but almost all of it disappeared once they accounted for the child’s socioeconomic background. What looked like a story about willpower predicting success was substantially a story about family circumstances predicting both waiting time and later success. The marshmallow test wasn’t really measuring willpower. It was, in significant part, measuring what kind of home a child came from.
This is not nothing. A test that detects early the consequences of family environment is still informative. But it’s a very different test from the one parents were being sold.
Why some children don’t wait
There’s a second piece of research that deepens the picture further, and it comes from a psychologist named Celeste Kidd at the University of Rochester.
Kidd had a hunch. She wondered whether some children weren’t refusing to wait because they lacked willpower, but because they had learned that promises from adults don’t always arrive. So she designed a clever experiment. Before offering the marshmallow, she had children work on an art project with one of two researchers. In one condition, the researcher promised to bring back better supplies — and did. In the other, they promised the same thing — and didn’t.
Then came the marshmallow.
Children who had met the unreliable adult waited, on average, about three minutes. Children who had met the reliable one waited, on average, twelve minutes — four times longer. Same children, same marshmallow, same rules. The only thing that had changed was whether the child had reason to believe the adult would keep their word.
Put differently: the children who “failed” the marshmallow test may not have been failing at self-control at all. They may have been being perfectly rational, given the information they had. If adults don’t always come back with the second marshmallow, the sensible move is to eat the one in front of you.
What the research actually suggests
If you put these strands together — the original experiments, the large-scale replication, and Kidd’s trust study — a much more interesting picture emerges than the pop-psychology parable.
Delayed gratification is not primarily a trait children either have or lack. It’s a skill that develops under certain conditions: a stable home, a reliable environment, caregivers who keep their promises, enough predictability that the long-term future feels like something worth planning for. Children who grow up in less predictable circumstances may look, on a fifteen-minute marshmallow test, like they lack self-control. In fact, they may simply have learned — accurately — that the present is more trustworthy than the future.
This reframing changes what we do with the research. The old version implied that if we could just teach children willpower, their lives would improve. The new version suggests that if we want children to develop the capacity to plan for the future, we need to give them a present they can trust. The intervention is upstream — in stability, in consistency, in the quiet architecture of a childhood where adults mean what they say.
This has sharper implications still. Adult behaviours we often call failures of self-control — impulse spending, addiction, reluctance to save for retirement — may be partly driven by similar logic. Humans who have learned that the future is unreliable are being rational when they prioritise the present. The policy and personal-finance response isn’t necessarily “develop more willpower.” It may be, at least in part, “build a future that feels worth saving for.”
The honest caveats
Mischel himself, in his later work, was more careful than the popular versions. He developed what he called the hot and cool systems model — two different modes of processing that interact under temptation. He emphasised situational factors. He acknowledged that his original population was narrow. By the time he died in 2018, shortly before the Watts replication was published, he had been trying for years to walk the popular story back toward something more accurate.
And the newer research doesn’t say that self-regulation doesn’t exist, or doesn’t matter. Children do vary. Self-control is a real capacity. The question is how much of later-life success it independently predicts, once you strip away everything else the marshmallow test was accidentally measuring. The answer seems to be: some, but considerably less than the parable suggested.
What to take away
There’s a version of this story that’s easy to tell and slightly bitter: the research was oversold, another finding collapses, trust nothing. That version is wrong, too, and it misses what’s actually interesting.
The original marshmallow studies asked a real question and found real patterns. The replication asked the same question more carefully and found a more honest answer. Kidd’s research explained part of the mystery. This is how science is supposed to work — not failure, but progress. The result is that we now understand delayed gratification better than when the test was invented.
What’s worth holding isn’t willpower predicts everything or willpower predicts nothing. It’s something more useful: the capacity to wait well is real, but it’s built out of the materials a life provides. Trust in the environment. Stability of expectations. Adults who keep their word. For the child in front of the marshmallow, and for the adult weighing a long-term financial decision, the same question lies underneath.
Is the future you’re being asked to save for one you have reason to believe in? That question doesn’t have a single answer. But it’s a better one to hold than the question the original parable taught us to ask.
Key research referenced: Walter Mischel’s original Bing Nursery School studies (Mischel, Shoda and Rodriguez, 1989); Watts, Duncan and Quan’s 2018 replication in Psychological Science; Celeste Kidd’s reliability experiment (Kidd, Palmeri and Aslin, 2013).