Y12W13WR Explore, then exploit

Observational
The writing prompt

Examine where you currently sit on the explore-exploit spectrum in two or three domains of your life, and reflect on whether your balance matches what each domain actually asks for.

1Retrieval check

Q1.What does the explore-exploit trade-off from reinforcement-learning research apply to life choices?

  • AAlways exploit
  • BEarly in a domain, explore widely; later, commit to the best option you’ve found — satisfaction correlates with matching the mode to the stage
  • CAlways explore
  • DIgnore the past

Q2.What is the article’s counter-thread against the explore-exploit frame?

  • AIt never misleads
  • BThe frame can rationalise either prolonged indecision (‘still exploring’) or premature commitment (‘time to exploit’) — the specific balance depends on domain and individual
  • CIt only applies to careers
  • DExploration is always better
Show answer key

Q1 → B. Early in a domain, explore widely; later, commit to the best option you’ve found — satisfaction correlates with matching the mode to the stage.People who explore early and then commit report higher satisfaction than those who commit too early or never commit.

Q2 → B. The frame can rationalise either prolonged indecision (‘still exploring’) or premature commitment (‘time to exploit’) — the specific balance depends on domain and individual.Know which mode you’re in for specific decisions; don’t apply exploration logic to commitments that require exploitation.

2Prompt deconstruction

Command verb
EXAMINE your current balance; REFLECT on the match
Domains
pick two or three — subjects, friendships, interests, future plans
Must identify
where you’ve drifted into the wrong mode (explored past the point where commitment would serve you, or committed before you really looked)
Close with
a specific shift you’d make for one domain

3Pick nudge

Which domains will show whether you are exploring, exploiting or in the wrong mode?

A domain where I’m exploring
And whether you should be
A domain where I’m exploiting
And whether you should be
A domain in the wrong mode
Where you’ve drifted — explored too long or committed too early

4Planner — for each of your picks

Domain
Current mode / right mode / evidence / what a shift looks like
#1
#2
#3

5Sentence stems

  • I noticed that ___ when ___.
  • The specific moment it stood out was ___.
  • Before paying attention, I had been assuming ___.
  • [Researcher’s] finding that ___ captures what I saw, because ___.
  • The pattern across my cases is ___.
  • What this tells me about [wider topic] is ___.

6Exemplar paragraph (not about this article)

(1) A domain where I am clearly still exploring is extracurriculars — I have tried three different clubs this year and a new one next term. The exploration mode is probably right here, because my Y11 base was narrow, and the explore-exploit research suggests early exploration should precede commitment. (2) A domain where I am exploiting is my study subjects — I committed to my Y11-to-Y12 combination a year ago, and the exploitation is appropriate: I have been specialising and the commitment is still paying off. (3) A domain in the wrong mode is friendships: I have been in exploration mode since the start of Y12, which felt natural after moving classes, but I can describe the cost — no friendship has deepened this year, because each conversation has been with a different person. (4) Before paying attention, I had been framing this as openness; the explore-exploit frame captures what I was missing — I was using exploration logic in a domain that now rewards commitment. (5) The specific moment it stood out was realising I could not name a single friend who knew what I was working on this term. The pattern across my three domains is that I default to exploration, and the default hurts where commitment is the paying move. (6) What this tells me is to name two specific friendships to invest in this term as an explicit shift from explore to exploit.

What this paragraph does, move by move

  1. Identifies an explore-mode domain with the right mode.
  2. Identifies an exploit-mode domain with the right mode.
  3. Identifies a domain in the wrong mode.
  4. Reveals the prior framing that obscured it.
  5. Names a specific, observable cost of the wrong mode.
  6. Closes with a concrete shift.