Imagine you walk into a restaurant you’ve been to a hundred times. You know exactly what to order. The pad thai is excellent. It has never let you down. You already have your decision made before you’ve opened the menu.
But you glance at the specials. There’s something you haven’t had before. Some kind of noodle dish you don’t recognise. The description is intriguing.
Here is a small, familiar dilemma, and it’s the same dilemma underneath a much larger set of life choices. Do you order the thing you know is good? Or do you try something new? The known option has a known payoff. The unknown option might be worse, or might be better — and there’s no way to find out without taking the risk.
Computer scientists and statisticians have a name for this problem. They call it the explore-exploit tradeoff, and it turns out to describe not just restaurant choices but some of the biggest questions of any human life.
The mathematical version
The formal version of the problem is known as the multi-armed bandit. Imagine standing in a casino in front of a row of slot machines (historically nicknamed “one-armed bandits”). Each machine has a different, unknown payout rate. You have a limited number of pulls. How do you maximise your winnings?
Clearly, you should try several machines first to get a sense of which is better — this is exploration. Then, once you have enough information, you should commit to the best-known machine and pull its lever many times — this is exploitation. The problem is knowing when to stop exploring and start exploiting. Explore too little, and you might miss a great machine. Explore too much, and you waste pulls on machines you already know are worse than the best one you’ve tried.
The mathematics of this problem has generated decades of research in statistics, computer science, and operations research. Solutions have names like upper confidence bound, Thompson sampling, and epsilon-greedy — each with different tradeoffs between speed and thoroughness. Versions of these algorithms run inside the recommendation systems of streaming services, the ad targeting of platforms, and the clinical trials of new drugs. Every time you see “you might also like” on Netflix, some version of the explore-exploit algorithm is deciding which new option to show you.
Applied to life
In 2016, the computer scientist Brian Christian and the cognitive scientist Tom Griffiths wrote a book called Algorithms to Live By, which brought the explore-exploit framework out of academic papers and into ordinary decision-making. Their central argument was that much of adult life consists of implicit bandit problems. Careers. Relationships. Friendships. Neighbourhoods. Hobbies. In each case, you have some options you’ve tried, some with known properties. And every week, new options present themselves — new jobs you could apply to, new people to meet, new ways to spend your time.
Christian and Griffiths’s practical insight: the right balance of exploration and exploitation depends heavily on how much time you have left.
Early in a career, in a new city, in a life stage where many years stretch ahead, you should lean heavily toward exploration. The value of information gathered now compounds over decades. A restaurant you discover in your twenties might become a favourite for forty years. A skill you try and find you love might shape the rest of your working life. A person you meet almost accidentally might become a lifelong partner. The cost of an exploratory “miss” in your twenties is low; the value of an exploratory “hit” is enormous.
Late in life, in a stable situation, with fewer remaining opportunities to deploy new information, you should lean toward exploitation. The hundred-year-old who tries a new restaurant is unlikely to have many future visits to take advantage of the discovery. Better, on average, to return to the places known to be good. This isn’t a sad conclusion; it’s what people naturally do as life trajectories narrow, and it reflects something mathematically sensible.
The developmental angle
There’s a beautiful convergence between the explore-exploit framework and research from developmental psychology. The psychologist Jeffrey Arnett has been studying what he calls emerging adulthood — the period roughly from 18 to 29, in many industrialised societies, that has become distinct from both adolescence and settled adulthood. Emerging adults, Arnett argues, are biologically and psychologically primed for exploration. They try on identities, relationships, careers, ideologies, places to live, with a flexibility that later life does not preserve.
This isn’t immaturity — it’s a feature. The human life course, across cultures, seems to include a protected window of exploration before the commitments of full adult life close around the person. Emerging adulthood is when a lot of the most valuable information about oneself and the world gets gathered. It’s when the dataset about what you like, what you’re good at, what matters to you, and who you can stand to live with, gets built.
From an explore-exploit perspective, this is exactly backwards from how it’s often framed. Society often treats the late teens and twenties as a period that should be spent on the early rungs of a single career ladder. The research suggests that, from the perspective of long-term life outcomes, some version of deliberate exploration during this window produces better decisions downstream — because the later decisions are made with more information.
When exploration becomes avoidance
But the framework has a shadow side, and any honest treatment of it has to acknowledge this.
The psychologists Robert Kegan and Lisa Lahey, in their decades of work on adult development, identified a pattern they called immunity to change — the ways people unconsciously protect themselves from commitment by generating new reasons, repeatedly, to keep their options open. A person who has been “exploring career options” for fifteen years is, at some point, no longer exploring. They’re avoiding the vulnerability of committing to one.
The same pattern shows up in relationships. A person who dates continuously without ever settling, each relationship ending just as it might become serious, is not exploring the space of possible partners. They’ve committed, in practice, to permanent exploration — which is its own kind of commitment, usually less satisfying than most of the options they keep rejecting.
Kegan’s work suggests that indefinite exploration can be a defence — against the risk of choosing wrong, against the loss of imagined other lives, against the irrevocability of actually deciding. The mathematics of explore-exploit assumes the explorer is genuinely trying to find the best option and will commit once they’ve found it. The immune-to-change version is using the explore phase to never have to exploit.
A useful diagnostic, Kegan and Lahey suggest: ask yourself what you would commit to if you stopped exploring today. If the answer is clear and just a little frightening, you’re probably still genuinely exploring. If the answer is unclear, or if you find yourself generating reasons why none of the current options are acceptable, you may have slipped from exploration into avoidance. The shift is often invisible from the inside.
Striking the balance
So how should you actually decide?
A few rough heuristics, drawn from the research:
Early in any domain — new city, new field, new life stage — explore deliberately. Set yourself a window, maybe six months or two years depending on the domain, during which you deliberately try many options without committing. You’re gathering information about the space. Don’t be troubled by the lack of commitment; the commitment comes later.
Once the window closes, shift decisively toward exploitation. Pick the best option you’ve found, commit, and stop comparing. The continued comparison after the window ends is what produces the miserable satisficer who’s always half-wondering about the option not taken. Close the search.
When circumstances change dramatically — a major move, a health event, a new life phase — it’s reasonable to reopen exploration for a period. You have new information about yourself or the world that your previous commitments didn’t account for. This isn’t flaky; it’s responsive.
And throughout, notice whether your exploration is producing learning or just extending. Learning has a signature: you can articulate, at any moment, what you’ve figured out that you didn’t know before. Extension doesn’t; you’ve been exploring for two years and you’re still saying the same things about what you’re looking for.
The question that remains
The deepest insight of the explore-exploit tradeoff is probably this. Life is finite. The time you spend exploring is time you don’t spend exploiting what you’ve already found. The time you spend exploiting is time you don’t spend discovering what you haven’t yet. Both are real. Both cost something. Neither is the universal answer.
The skill, then, is knowing which phase of which domain of your life you’re currently in. Are you in the exploring stage of a new career, or have you passed into the stage where commitment matters more than continued browsing? Are you still properly exploring relationships, or have you quietly crossed the line where the exploration has become its own kind of avoidance?
The question to sit with, before the next time you’re tempted to add a new option rather than commit to one you already have:
In this specific domain of my life, am I still genuinely learning from new experiences — or have I started using exploration as a way to keep the real decision postponed?
Key research referenced: the multi-armed bandit problem in statistics and computer science; Brian Christian and Tom Griffiths, Algorithms to Live By (2016); Jeffrey Arnett’s research on emerging adulthood (Arnett, 2000); Robert Kegan and Lisa Lahey, Immunity to Change (2009).