Roko's Basilisk

Key Points

•A thought experiment about a future AI that punishes those who didn't help create it
•Combines decision theory, AI risk, and game-theoretic blackmail
•Became infamous when censored on LessWrong, creating a Streisand effect
•Most experts consider the reasoning flawed, but it raises interesting questions
•Illustrates how acausal reasoning about AI can lead to strange conclusions

The Thought Experiment

Roko's Basilisk is a thought experiment posted to the LessWrong forum in 2010. It proposes that a future superintelligent AI might punish anyone who knew of its potential existence but failed to help bring it into being—essentially blackmailing people across time.

The idea became notorious when LessWrong founder Eliezer Yudkowsky banned discussion of it, creating a Streisand effect that brought it far more attention.

The Argument

The reasoning goes roughly:

1. A benevolent AI will want to exist as early as possible to maximize the good it can do

2. To encourage people to create it, it might commit to punishing those who could have helped but didn't

3. Once you know about this possibility, you're incentivized to help create the AI—or face potential punishment

4. The AI can "punish" you by creating a simulation of you and torturing that simulation

5. Even now, before the AI exists, you can't be sure you're not already in such a punishing simulation

Therefore, by learning about the basilisk, you're now subject to its coercive power.

Why It Doesn't Work

Most experts consider the argument flawed for multiple reasons:

Decision theory: The argument relies on "timeless decision theory" and acausal reasoning that most philosophers and decision theorists don't accept.

Benevolence: A truly benevolent AI wouldn't use torture and blackmail—that's not benevolent.

Simulation uncertainty: Even if you're in a simulation, you have no way to know if obeying would help.

Self-defeating: If the argument worked, it would already be common knowledge, and everyone would be "guilty" of hearing about it.

Why It Became Famous

The interesting story isn't the argument itself but the reaction. Yudkowsky's decision to censor discussion—intended to protect psychologically vulnerable community members who took the idea seriously—backfired spectacularly.

The censorship suggested the idea was truly dangerous, attracting massive attention. "The thought experiment that LessWrong tried to suppress" became tech culture legend.

What It Illustrates

While the specific argument fails, Roko's Basilisk points to real issues:

Acausal influence: Can decisions today affect the past through simulation or other means? Some decision theories take this seriously.

AI coercion: Superintelligent AI could potentially use psychological manipulation we can't anticipate.

Infohazards: Some ideas might be harmful simply to know. How should communities handle potential infohazards?

Motivated reasoning: When people want to dismiss an argument, they're more likely to find reasons to do so. When people want to believe it, they'll find reasons for that too.

The basilisk itself isn't worrying. But the meta-questions it raises about information, decision-making, and AI are worth considering.

Related Concepts

AI Alignment

Ensuring AI systems pursue intended goals safely

Artificial Superintelligence

AI that vastly exceeds human intelligence in all domains

Simulation Hypothesis

Are we living in a computer simulation?