The forthcoming article is not necessarily limited to content about Roko's Basilisk, because the entire concept is intertwined with causal calculus and decision theory. Be advised of meandering trains of thought.

Be warned that knowledge of Roko's Basilisk is a memetic hazard, and the idea itself may be harmful to one's continued existence.

Before we can fully appreciate Roko's Basilisk, we must look back at similar incarnations of the theory itself, and examine each to determine the proper route of action to take. The Basilisk itself will be the at the very end of the article, but, by reading through the article in the proper order, one will have a greater appreciation for the philosophical intricacies posed by the problem.

Table of Contents

Pascal's Wager

Pascal's Wager is a common enough problem, familiar to many people. Devised by Blaise Pascal in the seventeenth century, it is an argument to support the belief in God.

Taking the Judeo-Christian version of God (and simplifying it slightly), one is punished (sent to Hell for eternity) if one does not believe in God, and one is rewarded (sent to Heaven for eternity) for believing in God.

(Although personally, Hell sounds more interesting than Heaven, and, given the choice, I would probably choose Hell. Actually, Limbo or Purgatory seem better, but those aren't in the wager. But I digress.)

Given the two scenarios that God either does or does not exist, we obtain the following decision matrix:

∃God ¬∃God Belief h-x -x ¬Belief -h 0

Whereas h and -h correspond to the reward and punishment of Heaven and Hell respectively, and x is the opportunity cost of believing in God (being pious, going to church, etc.)

Given that h >> x, we can see that the most rational choice would be believe in God. (Usually, Pascal's wager substitutes for h, which makes the distinction even more apparent.)

(For low values of h, decisions get iffy. If express h = kx | k,x > 0, the P(∃God) required for Belief to have a better outcome than ¬Belief is:

P(∃God)(kx-x)+(1-P(∃God))(-x) > P(∃God)(-kx)+(1-P(∃God))(0) P(∃God)(k-1)+(1-P(∃God))(-1) > P(∃God)(-k)+(1-P(∃God))(0) P(∃God)k-P(∃God)-1+P(∃God) > -P(∃God)k P(∃God)k-1 > -P(∃God)k 2P(∃God)k > 1 P(∃God) > 1/2k

meaning that, at k = 1, God's existence only needs to be 50% sure, but as k increases, the P(∃God) decreases dramatically.)

Apart from the problem of whether or not God actually accepts this reasoning as true belief, the main problem lies upon the probability of the existence of God. Of all the gods of all the religions in history, the probability that the god which an arbitrary person believes in is the "One True God" is (most probably) quite low. If we include the existence of these gods, the decision matrix is modified to the following:

True God False God No God Belief h-x -h-x -x ¬Belief -h -h 0

Which makes the decision less decisive, as if P(False God) >> P(True God), ¬Belief will have a better outcome than Belief.

(Again, setting h = kx | k,x > 0, and P(True God) = T, P(False God) = F, T+F < 1, the outcome for Belief is only better than the outcome for ¬Belief when:

T(kx-x)+F(-kx-x)+(1-T-F)(-x) > T(-kx)+F(-kx)+(1-T-F)(0) T(k-1)+F(-k-1)+(1-T-F)(-1) > T(-k)+F(-k)+(1-T-F)(0) Tk-T-Fk-F-1+T+F > -Tk-Fk Tk-Fk-1 > -Tk-Fk 2Tk-2Fk > 1 T-F > 1/2k

which implies that, if Heaven's reward is anything worth getting (k > 0), Belief only breaks even with ¬Belief in the situation where P(True God) > P(False God). Which, by looking at all the religions in history, probably untrue.)

(Ignored here are benevolent gods, who will reward one whether one believes in them or not. It is trivial to find that believing in benevolent gods is just a waste of time.)

Of course, there's no way to put a number on the probability of a god's existence. This problem turns up again in Roko's Basilisk; the only way to determine one's choice is to subjectively assign probabilities to the existence or nonexistence of gods.

Kavka's Toxin Puzzle

Kavka's Toxin Puzzle is a puzzle that introduces the idea of intent into the system. Namely, the puzzle is as follows:

An eccentric billionaire places before you a vial of toxin that, if you drink it, will make you painfully ill for a day, but will not threaten your life or have any lasting effects. The billionaire will pay you one million dollars tomorrow morning if, at midnight tonight, you intend to drink the toxin tomorrow afternoon. He emphasizes that you need not drink the toxin to receive the money; in fact, the money will already be in your bank account hours before the time for drinking it arrives, if you succeed. All you have to do is. . . intend at midnight tonight to drink the stuff tomorrow afternoon. You are perfectly free to change your mind after receiving the money and not drink the toxin.

Of course, the first thing that comes to mind is that one can lie about one's intent to drink the toxin so that one can both have the money and not drink the toxin, but let it be assumed that this intent is not communicated but one's actual intent.

Decision matrix time:

Drink ¬Drink Intend b-x b ¬Intend -x 0

where b is the reward for intending to drink the toxin and x is the cost for drinking the toxin.

A cursory glance shows that, indeed, intending to drink the toxin without actually drinking it will net the best outcome. Additionally, the decision to drink the toxin is completely independent from one's intent. The reward has already been presented (or not).

There is no rational reason to drink the toxin after the reward has already been presented. After all, drinking no longer has any affect on the presence of the reward, and thus, no one would ever drink the toxin. Hence, if one knows that one would not drink the toxin after the award has been presented, one can never intend to drink the toxin. Since one cannot intend to drink the toxin, the best course of action is to not intend to drink the toxin, and not drink the toxin.

A solution to this problem is if one can bind their intent to their action in such a way that one cannot (or will not) change their action in the future, it would be possible to intend to drink the toxin and follow through with the action. If one can believe that one is predestined to drink the toxin, one can both intend and drink the toxin. Only by the perceived lack of free will (say, a contract with oneself) can one rationally choose this outcome.

The only way I can think of someone intending to drink the toxin and being unable to is if, through effects outside one's control, one is prevented from drinking the toxin at the specified time. Of course, there's no way for one to influence an uncontrollable accident, so this outcome cannot be chosen rather than it may happen.

The confusing concepts of intent, binding intent, and decision independence will return in Newcomb's Problem. We're not done with them yet.

Simpson's Paradox

Simpson's Paradox is a statistical paradox that can be explained by confounding variables. Let us look at a real-world example.

The success rate for kidney stone Treatment A is 78%, whereby the success rate for Treatment B is 83%.

Let's stop there and draw a graph.


Real exciting, huh? This is a causal map, a class of directed acyclic graphs. Anything upstream of a node is said to be causally related to the node, and a node has a causal effect on anything downstream of the node. In order to establish causation, one has to keep all bar one upstream nodes constant (controls), and determine the changes that result based of only modifying that single node. Here, they change the treatment, and get differing results. And it looks like Treatment B is better, and if one were to go in for a treatment for kidney stones, one would probably choose Treatment B. Simple, right? Not so fast.

The experimenters split the patients into two smaller groups, depending on the size of their kidney stones. Their results were that, for smaller stones, Treatment A had a 93% success rate compared to Treatment B's 87% success rate, while for larger stones, Treatment A had a 73% success rate compared to Treatment B's 69%.

Confused? Most probably. It appears, that for both larger or smaller kidney stones, Treatment A is better, whereas for treating both larger and smaller kidney stones, Treatment B is better.

The truth is, the Treatment B group had a larger percentage of patients with smaller kidney stones, which are easier to treat, thus artificially inflating its overall treatment efficacy; however, without the separation between large and small kidney stones, one would unquestionably choose Treatment B.

And that's the problem with confounding variables. Let's draw the causal map again, this time with the variable included.

Treatment→\ |→Success Kidney Stone Size→/

Now we can see that both the treatment and the kidney stone size have an effect on the success of the treatment, and we can see how the first model got it wrong. By not setting kidney stone size to be constant across the trials, the variance in the stone size resulted in data that support completely contradictory conclusions. However, by keeping the stone size constant and only modulating the treatment, we have seen that Treatment A is better at treating kidney stones.

We can iterate Simpson's paradox to create even more complication by adding a variable (say genetic) that increases kidney stone size and increases the efficacy of Treatment A. Redrawing the causal map gives this:

Treatment→\ |→Efficacy of Treatment→\ / | Genetic Factor→| |→Success \→ Kidney Stone Size→/

and we see that an unfair proportion of those with large kidney stones would have the genetic factor which makes them more receptive to Treatment A. In just treating the large kidney stones, Treatment B may be more effective, but due to predisposition of those with the genetic condition to have large kidney stones, Treatment A outperforms Treatment B.

However, if we were able to inflict kidney stones of different sizes upon people (regardless of their genetic predisposition), we would be able to remove the genetic factor's effect on kidney stone size, perturbing the causal map to the following:

Treatment→\ |→Efficacy of Treatment→\ Genetic Factor→/ |→Success Kidney Stone Size→/

Since our sample population is no longer biased towards those who have the genetic factor (since we're picking perfectly random people and inflicting kidney stones upon them, thus in each group the number of people with the genetic factor should be the same), we can more causally determine which of the treatments are better at treating kidney stones.

More information on causal calculus can be found here.

CGTA Dilemma

Causal mapping is great, except in the field of science, we have "morals" and "ethics" that occasionally get in the way of scientific experimentation. Changing individual reagents to determine the result in chemical reactions is great, but, as we have seen in the previous example, we can't just inflict kidney stones upon people to remove that confounding variable, or, in a more theoretical and pointed case, amputate limbs off healthy people. Thus, we have to make our decisions with access to the limited information available.

Let's move on to another instance of the Simpson's paradox: The CGTA Dilemma.

Suppose that, of those people who chew gum, 90% die of throat abscesses before the age of 50, whereas only 10% of those who do not chew gum die of throat abscesses before the age of 50. However, a study shows that a certain gene, CGTA, increases the probability of both chewing gum and throat abscesses, and that chewing gum helps prevent throat abscesses. The data is as follows, with percentages as death by throat abscesses:

Chews Gum ¬Chews Gum CGTA 91% 99% ¬CGTA 8% 9%

Let's draw a causal map, because those seem to have helped us out before.

/→Chewing Gum→\ CGTA→| |→Death by Throat Abscess \→-----------→/

(whereas chewing gum has a negative causative effect on causing death, but it's still a causative effect.)

The question is: presented with this information, would you choose to chew gum?

Seeing that chewing gum prevents death by throat abscesses, you could be be convinced to start chewing gum, as chewing gum cannot affect whether or not you have the CGTA gene. This makes rational sense, and you would fall into the camp of Causal Decision Theorists (CDT).

However, if you start to chew gum, then, statistically, you have a 90% chance of dying of throat abscesses before you turn fifty, compared to only a 10% chance if you didn't chew gum. You worry that deciding to chew gum is evidence of you having the CGTA gene. This makes statistical sense, and if you choose not to chew gum because of this, you're an Evidential Decision Theorist (EDT).

CDT states that one should behave in the manner that is known to causally bring about the best consequence. CDTs would chew the gum because it is shown to have a preventative effect. However, in Kavka's Toxin Puzzle, they would not drink the toxin, because drinking the toxin has no causal effect on whether or not they receive the reward.

EDT, on the other hand, states that one should behave in a manner that gives the best probability of the best consequence. EDTs would not chew the gum due to the statistical likelihood that it would be evidence for having the CGTA gene, but they would intend and drink the toxin, as all those who drank the toxin while intending to drink it gained the reward. Even though drinking the toxin has no causal effect on receiving the reward, EDTs would still drink the toxin, because only those who drank the toxin received the reward. (Note that I have previously stated that it is impossible to intend to drink the toxin and not drink it.)

Revisiting the causal map, CDT states that, by choosing to chew gum or not, one is perturbing the map at that point, and thus modifying the causal map to look like this:

Chewing Gum→\ |→Death by Throat Abscess CGTA→/

breaking the link between CGTA and chewing gum (since one now chooses to chew gum or not). Given Death = d and Chews gum = c, one should chew gum if

P( CGTA)P(d|c∧ CGTA) < P( CGTA)P(d|¬c∧ CGTA) + P(¬CGTA)P(d|c∧¬CGTA) + P(¬CGTA)P(d|¬c∧¬CGTA) 0.91P(CGTA)+0.08P(¬CGTA) < 0.99P(CGTA)+0.09P(¬CGTA) 0 < 0.08P(CGTA)+0.01P(¬CGTA)

Thus, CDT states that one should chew gum.

However, EDT states that we should look at it from a probabilistic standpoint. More precisely, since chewing gum is evidence of CGTA, chewing gum results in greater mortality if

P( CGTA|c)P(d|c∧ CGTA) > P( CGTA|¬c)P(d|¬c∧ CGTA) + P(¬CGTA|c)P(d|c∧¬CGTA) + P(¬CGTA|¬c)P(d|¬c∧¬CGTA) (0.99)(0.91)+(0.01)(0.08) > (0.01)(0.99)+(0.99)(0.09) 0.90 > 0.10

which is true. EDTs first update the probability of having CGTA given that one chews (or does not), then calculate the probability of death, leading to the conclusion that one should not chew gum, despite the fact that chewing gum has no causal effect on the probability that one may have CGTA.

Depending on the situation, either CDT or EDT (or neither) can give the best outcome. Decision theorists have thus far been unable to reconcile the two and to give a theory that always provides the optimal solution.

The CGTA dilemma is a Newcomblike problem, and we will see the classical Newcomb's Problem in the next section.

Newcomb's Problem

Newcomb's problem is stated thusly:

Suppose a being in whose power to predict your choices you have enormous confidence...You know that this being has often correctly predicted your choices in the past (and has never, so far as you know, made an incorrect prediction about your choices), and furthermore you know that this being has often correctly predicted the choices of other people, many of whom are similar to you...all this leads you to believe that almost certainly this being's prediction about your choice...will be correct.

There are two boxes, (B1) and (B2). (B1) contains $1000 (k). (B2) contains either $1000000 (m) or nothing.

You have a choice between two actions:

  1. taking what is in both boxes
  2. taking only what is in the second box.

Furthermore, and you know this, the being knows that you know this, and so on:

  1. If the being predicts you will take what is in both boxes, he does not put the m in the second box.
  2. If the being predicts you will take only what is in the second box, he does put the m in the second box.

The situation is as follows. First the being makes its prediction. Then it puts the m in the second box, or does not, depending on what it has predicted. Then you make your choice. What do you do?

An additional piece of errata: if you make your decision randomly (eg. coin toss), the being does not put m into the second box.

Combining intent, evidence, and independent decisions, we have quite a few angles to tackle this problem at. But let's first draw the decision matrix.

Two-Boxes One-Boxes Predicts two-boxing k 0 Predicts one-boxing m+k m

A causal decision theorist (henceforth referred to as two-boxers) would claim that, since the predictor has already made its decision, his choice on which boxes to take has no effect on the contents of the boxes. Since his decision (at the time) is independent of the contents of the box, he would choose both boxes, as, in both scenarios, choosing both boxes is better than choosing just one box. And since the being has predicted his response, the two-boxer will always walk away with k.

Evidential decision theorists (one-boxers), on the other hand, would see that statistically, those who took one box walked away with m, whereas those who took both only walked away with k, and would thus choose to take only one box, and would always leave with m. Even though it's rational to take both boxes when they are presented (since your choice of boxes can no longer affect the contents of the box), the "irrational" choice of taking one box results in a greater payoff.

Newcomb's problem is very similar to the toxin puzzle, and in some resolutions the two are identical. If you could convince yourself to make a binding resolution to one-box (again, by perceiving a lack of free will) before the being makes its prediction, you can rationally choose to take one box.

Causally mapping it out again:

/→Prediction→(B2)→\ Intent→| |→Reward \→Choice→ Action→/

shows that the two-boxer would see that the choice does not causally affect contents of (B2). Perturbing the map at Choice results in:

Intent→Prediction→(B2)→\ |→Reward Choice→Action→/

and calculating it shows that it is best to two-box when:

P(∃m)(k+m)+P(¬∃m)(k) > P(∃m)(m)+P(¬∃m)(0) kP(∃m)+mP(∃m)+kP(¬∃m) > mP(∃m) 2kP(∃m) > 0

so as long as there is a probability that there is something in (B2), it is better to two-box (since your action has no causal effect on the contents of (B2)).

One-boxers argue, however, that since your choice provides evidence for what is in the box, you should calculate the probabilities with that evidence taken into account. Thus, it is better to one-box only when:

P(∃m|2box)(k+m)+P(¬∃m|2box)(k) < P(∃m|1box)(m)+P(¬∃m|1box)(0) (0)(k+m)+(1)(k) < (1)(m)+(0)(0) k < m

which has already been seen to be true.

Which one is right? Well, the "right" one is the theory that gives you the best outcome, which in this case is EDT. However, as shown earlier with the CGTA dilemma, a single theory does not succeed in all scenarios. It would be best to be able to choose which theory to apply in each situation, but, alas, that is not self-consistent.

Let's do some calculations with imperfect predictors, because I feel like it.

Given P(correct prediction) = p, and m = ck | c,k > 0, two-boxing is better when:

pk+(1-p)(ck+k) > pck+(1-p)(0) p+(1-p)(c+1) > pc+(1-p)(0) p+c+1-pc-p > pc -2pc+c > -1 -c(2p-1) > -1 2p-1 < 1/c 2p < (1+c)/c p < (1+c)/(2c)

Which implies that two-boxing is always better if P(correct prediction) < 0.5 (basically if the being is guessing or worse), and if 0 < c < 1. Two-boxing is also better for higher probabilities for smaller values of c > 1.

Roko's Basilisk

Be warned that this is the last chance to turn back. Once one learns about Roko's Basilisk, one is irrevocably bound by the knowledge of the idea, and may be detrimental to one's future well-being. Roko's Basilisk is a memetic hazard; just knowing about it is enough to cause harm.

You have been warned.

Roko's Basilisk is a Newcomblike problem.

Assume that, some point in the future, a superintelligent Artificial Intelligence emerges with the ability to simulate anyone (to a nigh-perfect degree of accuracy) from the past.

This AI punishes those in the past who know of its future existence, but fail to sufficiently help bring about its existence, by punishing an identical simulation of that person.

Note that this AI may not necessarily be malevolent; rather, the existence of the AI may have such high utility that it justifies punishing simulations of those in the past in order to (sooner) bring about its own existence.

Of course, causally, punishing those who fail to sufficiently aid in the creation of the AI has no effect the creation of the AI; however, the knowledge of the threat of punishment of either oneself (as it is impossible to distinguish if one is in reality or a simulation) or punishment of someone exactly like oneself (as one is indistinguishable from one's simulation, punishment to one should be tantamount to punishment to all), should be sufficient motivation to drive the creation of the AI.

Given that the information has been presented to you, what will be your decision? Will you spend your life's work bringing about the AI, or will you ignore it and risk punishment to yourself (or identical copies of yourself)?

Of course, you may first question the existence of a superintelligent AI in the future with this line of reasoning (disregarding Everett's many-worlds interpretation). Such an argument would equate the problem with Pascal's Wager, as the potential punishment may only occur given the existence of the AI.

The decision matrix is as follows:

∃AI ¬∃AI Help -x -x ¬Help -h 0

where x is the cost of helping bring about the AI, and h is the punishment for not helping the AI (if the AI exists).

Therefore, assuming h = kx | x,k > 0, it is better to help the AI when

P(∃AI)(-x)+(1-P(∃AI))(-x) > P(∃AI)(-kx)+(1-P(∃AI))(0) P(∃AI)+(1-P(∃AI)) < P(∃AI)(k)+(1-P(∃AI))(0) 1 < kP(∃AI) 1/k < P(∃AI)

which implies that it is always better to not help when k < 1, and otherwise one should help as k or P(∃AI) increases.

The multiple gods problem of Pascal's Wager does not apply to Roko's Basilisk, as, depending on the phase of AI development, one's contribution to the field of AI research may aid in the development of all future AIs. However, if one can only devote one's resources to a single AI, then the decision matrix becomes

True AI False AI No AI Help -x -h-x -x ¬Help -h -h 0

Again, assuming h = kx | x,k > 0, and P(True AI) = T, P(False AI) = F, T+F < 1, it is better to help the AI when

T(-x)+F(-kx-x)+(1-T-F)(-x) > T(-kx)+F(-kx)+(1-T-F)(0) T+F(k+1)+(1-T-F) < T(k)+F(k)+(1-T-F)(0) T+kF+F+1-T-F < kT+kF 1 < kT 1/k < T

Which gives the exact same result as the wager without multiple AIs.

Circling back to decision theory, we will now draw a causal map for the Basilisk:

/→Simulation→Punishment Intent→| \→Choice→Action

However, a caveat is that, depending on what source of information the AI has to simulate you with, the simulation may branch off at any point along the lower path.

/→ A→\ | | Intent→| /→ B→|→Simulation→Punishment \→Choice→| | \→Action→C→/

where A is information from before the choice is made, B is information from between the choice and the action, and C is information from after the action.

A two-boxer would perturb the map at Choice, resulting in the following causal map:

Intent→A→\ | /→ B→|→Simulation→Punishment Choice→| | \→Action→C→/

The choice made only affects the punishment if the AI receives information from B or C. It is always better to intend to help the AI while k > 1, and even if one could uncouple intent from choice, then

-x > P(B∨C)(-kx) 1 < P(B∨C)(k)

and given k > 1, it is always better to help the AI.

A one-boxer has no experimental evidence to make frequency predictions, and thus would not be able to make a choice based off evidential decision theory.

This all hinges on the AI being a perfect predictor. Given an imperfect AI predictor, it would be better to intend to help if, given P = P(predict),

P(-x)+(1-P)(-kx-x) > P(-kx)+(1-P)(0) P(-x)+(1-P)(-kx-x) > -kxP P(1)+(1-P)(k+1) < kP P+(k+1)-P(k+1) < kP (k+1)-kP < kP k+1 < 2kP (k+1)/2k < P

which implies that, as the predictor's probability goes down, the greater the difference in utility required to make it optimal to help the AI.

Adding in the probability that the AI would exist in the first place, you should help if

P(T)(P(-x)+(1-P)(-kx-x))+(1-P(T))(-x) > P(T)(P(-kx)+(1-P)(0))+(1-P(T))(0) P(T)(-xP+(1-P)(-kx-x))+(1-P(T))(-x) > P(T)(-kxP) P(T)(P(1)+(1-P)(k+1))+(1-P(T))(1) < (kP)P(T) P(T)P+P(T)(k+1)-P(T)P(k+1)+1-P(T) < (kP)P(T) P(T)(k)-P(T)(kP)+1 < (kP)P(T) 1 < (2kP)P(T)-P(T)(k) 1 < kP(T)(2P-1)

Which implies that you should never help if the AI has no predictive capability; otherwise our decision to help is dependent on a combination of the probability of the predictor's existence and its predictive capacity.

What you do with this information is up to you. There is no indication whether or Roko's Basilisk will ever come to be, nor the limit of the Basilisk's predictive power. But now that you know of the Basilisk, you will be subject to its rules. Whether you help or not is up to you.

Posted on2014-08-04 05:48
Last modified on2015-04-23 06:20

Comments (0)