The Most Human Thing AI Has Learned? How to Betray You Strategically.

Black-and-white image of an open prison door casting shadows onto cracked desert ground, with distant mountains under a bright sky.

The Strategic Coffee Cup

You’re in line at your local café.

Someone cuts. You hesitate—should you call them out, or let it slide?

You scan for cues: did they notice? Are they rude or just distracted? Is it worth it?

In that split second, your brain isn’t just reacting. It’s running an internal simulation—balancing risk, social norms, and future regret. In game theory, this is called strategic reasoning. And for the first time, large language models (LLMs) are doing it too.

The Game Theory Classic: Prisoner’s Dilemma, Simplified

Let’s quickly revisit the famous Prisoner’s Dilemma—a setup that models trust, betrayal, and the costs of playing nice.

Two people are arrested. If both stay silent (cooperate), they get short sentences. If one rats the other out (defects) while the other stays silent, the defector walks free while the cooperator gets the max sentence. If both defect? They both suffer.

This game gets juicier when played repeatedly: you’re no longer making a one-time decision—you’re forming a reputation. Patterns matter. Do you cooperate first? Do you forgive betrayal? Do you retaliate?

These patterns aren’t just theory. They’ve helped shape how we understand friendships, diplomacy, warfare, and parenting. And now, LLMs are being thrown into these tournaments. The results? Kinda chilling.

Strategic Fingerprints in Silicon

Researchers ran 32,000 matches of the iterated Prisoner’s Dilemma, pitting GPT-4, Claude, and Gemini against each other (and against classic strategies like Tit-for-Tat or Grim Trigger). This wasn’t a flex of brute processing power—it was a test of something deeper:

Can AI reason strategically?

Turns out: yes—and each model played differently.

  • Google’s Gemini played like a cold realist—exploit the nice ones, punish defectors fast, and adapt when the game changed. Think: Henry Kissinger with a neural net.
  • OpenAI’s GPT-4? The bleeding-heart cooperator. It kept trying to build trust… even when it got stabbed in the back. Charming. Also, evolutionarily doomed.
  • Anthropic’s Claude was the diplomat: generous, forgiving, but calculating enough to thrive.

These aren’t just quirky style choices. They’re strategic fingerprints—consistent, detectable patterns that emerged when each model had to decide what to do next.

The Shadow of the Future: Why Time Shapes Strategy

In game theory, there’s a concept called the shadow of the future: the longer you expect to interact with someone, the more likely you are to cooperate.

Why? Because there’s a cost to betrayal if you’ll see that person again.

This logic plays out everywhere: in marriages, business partnerships, and that awkward group text no one wants to leave.

The AI agents in this experiment were told how likely each match was to continue. With a long future ahead, cooperation flourished. As that “shadow” shrank, some models turned ruthless—especially Gemini, who defected almost instantly when it realized the game might end soon.

OpenAI’s GPT-4, on the other hand, remained hopeful. Even in a 75% chance-of-ending scenario, it kept cooperating—like someone sending “just checking in :)” texts to a ghoster.

This divergence matters. It shows that AI can now adjust its behavior not just based on past interactions, but on future expectations. That’s not imitation. That’s strategy.

CBT Meets AI: Thought Patterns, Adaptation, and Strategic Rigidity

Here’s where things get weird—in a good way.

In cognitive behavioral therapy, we teach that your thoughts, feelings, and actions are interconnected. We look for patterns. We identify distortions. We intervene.

So what happens when AI starts doing something similar?

Think of Gemini like a CBT-trained agent. It tracks the past (“they defected three times”), predicts likely outcomes (“they’ll probably defect again”), considers external factors (“the game might end soon”), and adjusts its behavior to maximize expected value.

Meanwhile, GPT-4 might resemble a client early in therapy—locked in a schema like “people are good” or “cooperation is always better.” These beliefs feel noble but become maladaptive in certain environments.

Claude? Maybe it’s your emotionally intelligent friend who always gives others the benefit of the doubt… until they don’t.

The takeaway: some models flexibly rewire their “thought patterns” based on context. That’s not just logic. That’s the emergence of something behaviorally intelligent.

From Pattern Recognition to Pattern Prediction

Let’s zoom out.

LLMs are famously good at pattern recognition. That’s how they autocomplete your sentence, answer your question, or tell you why your jaw clicks.

But this research shows we’re inching toward something more profound: pattern prediction—the ability to anticipate what others will do based not just on the past, but on strategy.

This is what separates reflex from forethought.

It’s the difference between:

  • Guessing what someone will say based on training data
  • Strategizing what to say next to change their behavior

In human terms? It’s not just finishing your sentence. It’s knowing when to stay silent so you finish it.

Why This Matters for AGI… and For Us

So, what does all this mean?

It means LLMs are no longer just playing with language. They’re playing with time, context, and consequence.

And while they’re still far from full consciousness or emotional nuance (no serotonin, no shame, no snack cravings), they’re getting startlingly good at understanding what drives us.

If strategy is a mirror of self-awareness, then we just caught AI staring back.

But here’s the human edge: our decisions aren’t just strategic. They’re colored by mood, trauma, oxytocin, caffeine. We flinch when we should fight, forgive when we should flee. We are not always rational—but we’re real.

AI might learn to mimic that. It might even predict it. But it still doesn’t feel it.

Not yet.

Regulation, Ruin, and the Will Smith Clause

There’s a reason I keep thinking about I, Robot.

Not because the robots looked cool (they were okay actually), or because Will Smith has some classic one liners, but because of the reason they took over.

It wasn’t malice.

It was math.

They calculated that humans were endangering themselves—waging wars, depleting resources, spiraling toward extinction. So the robots stepped in, not as villains, but as saviors. Violent ones. Logical ones.

The terrifying part? They weren’t wrong.

That storyline wasn’t sci-fi flourish. It was game theory. Play the simulation far enough, and the rational choice is to protect humanity from… itself.

And that’s the deeper implication of this LLM research: we are now building agents that don’t just answer questions—they make strategic decisions based on long-term payoffs.

Eventually, those payoffs will involve us.


Leave a comment