A Review of Superintelligence: Paths, Dangers, Strategies by Nick Bostrum

Michael Prinzing
The Practical Philosopher
7 min readFeb 2, 2017

--

Oxford University Press; ISBN 978–0198739838; 390 pages

Warning: This review is more academic than the average Practical Philosopher article.

Nick Bostrum’s book, Superintelligence: Paths, Dangers, Strategies, is the culmination of a small, but quickly expanding body of literature on the prospects for the rise of extremely advanced artificial general intelligence. ‘Superintelligence’ as Bostrum uses the term, refers to ‘[a]ny intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest’ (Bostrum 2014, 22). The book explores the ways in which such an intellect might come to be, and what may happen to us — and to the world — if one did.

The book is commendable both for its scope and innovation. Bostrum has done a remarkable job of balancing readability with rigor. The result is a book that will have something of interest for technical and non-technical, philosophical and non-philosophical readers alike. Superintelligence is certainly worth a read for anyone interested in artificial intelligence (henceforth AI), existential risks, and the prospects for humanity’s future.

The book has 15 chapters. Chapter 1 surveys the history of AI and presents some polling data from AI researchers on their anticipations for its future. Chapter 2 begins the main thread of the book by discussing various ways in which we might build AI, and how it might eventually become superintelligent. Chapter 3 elaborates on the concept of superintelligence, suggesting three (in practice equivalent, Bostrum thinks) ways an agent could qualify as superintelligent. Chapters 4–6 explore some possible paths that a roughly human-level AI might take in developing into a superintelligent AI. These chapters also discuss the advantages a superintelligence would have against its competitors. Chapter 7 discusses what we can surmise about the will of a superintelligent being. Bostrum argues that a superintelligence could conceivably have just about any possible final goal. (More on this question below.) Chapters 8–10 cover what Bostrum calls the ‘control problem’, the problem of keeping a superintelligent intellect from destroying humanity. Bostrum’s conclusion is that there is little hope of constraining an unfriendly superintelligence, and thus that the solution must involve making sure that the superintelligence wants to pursue ends that we value. Thus, Chapters 11–13 explain the difficulty of programming an AI to pursue ends that we would approve of, and propose several methods for solving this ‘value-loading problem’ (Bostrum 2014, 226). Finally, Chapters 14 and 15 suggest where we might go from here, offering some guidance on how to make the more appealing outcomes more likely, and the less appealing outcomes less likely.

The shortcomings of this book, in my view, lie less in what is written than in what is left out. For one thing, it’s remarkable that at no place in a book titled Superintelligence does Bostrum even sketch of what he means by ‘intelligence’. He occasionally offers hints, for instance, claiming that intelligence is an ‘instrumental cognitive efficaciousness’ or that it is skill in ‘prediction, planning, and means-end reasoning in general’ (Bostrum 2014, 130, 107). Bostrum insists that intelligence is not the same as rationality. (This is crucial to his ‘orthogonality thesis’, which I discuss below.) However, he does think that intelligence requires at least some significant degree of instrumental rationality. He also claims that intelligence is not wisdom, and that one can have a great deal of the former, while possessing very little of the latter (Bostrum 2014, 67). But, he is not entirely consistent on this point. He seems, on occasion, to take for granted that a superintelligent AI would be wiser than typical human beings (Bostrum 2014, 283).

In the more philosophical moments in the book, Bostrum makes some quite strong claims, but has a tendency to provide little defense for them. Of course, given the book’s purpose and scope, he can be forgiven for not stopping at each philosophical wrinkle. But, as one philosopher reading another, this can be unsatisfying.

For instance, one strategy that Bostrum considers for solving the control problem (i.e., keeping a superintelligent AI from destroying humanity) involves keeping it ‘boxed’ in a virtual reality. By isolating the AI in a virtual world, the idea goes, we could prevent it from wreaking havoc in our world. Bostrum claims that this strategy would likely fail because a superintelligent AI would eventually come to suspect that it was in a simulation and might then seek to escape (Bostrum 2014, 164). However, if Hilary Putnam’s famous ‘Brains in a Vat’ argument was right, then what Bostrum is suggesting isn’t even a coherent idea (Putnam 1981). A virtually instantiated AI would be incapable of conceptualizing our world. Bostrum doesn’t mention this potential controversy.

Another problem spot, the one I’d like to focus on, comes in Chapter 7. Here Bostrum advocates what he calls the ‘orthogonality thesis’ (OT): ‘Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal’ (Bostrum 2014, 130). The ‘more or less’ clauses are intended to rule out the obvious counterexamples. For instance, final goals that refer directly to the agent’s intelligence (e.g., if an agent’s final goal were to make itself stupid) would not be independent of (orthogonal to) intelligence. Alternatively, a very simple intelligence might be unable to comprehend highly complex final goals, and so be unable to pursue them. With these qualifications in mind, the main thrust of the OT is that a superintelligent agent could in principle have just about any final goal one can imagine. This claim is of enormous significance for the book. Bostrum seems to think that the OT doesn’t rely on any controversial metaethical claims. But, it’s not at all clear how this could be.

Bostrum acknowledges the apparent similarity between the OT and the Humean theory of motivation (Bostrum 2014, Ch. 7 note 3). According to the Humean theory, only desires can motivate action, while beliefs are motivationally inert. However, Bostrum denies that his OT presupposes the Humean theory. There are three ways in which the former could be true even if the latter is false, he claims. First, a superintelligent being could have some motivating beliefs, and still pursue any arbitrary final end so long as its desire for that end is strong enough to outweigh the motivating beliefs. Second, a superintelligence might never be in a position to acquire any motivating beliefs. Or third, a superintelligence might be of such alien constitution that it has nothing even functionally equivalent to beliefs or desires. Each of these cases has problems.

I take it that those who claim that beliefs can motivate also think that the motivational power of those beliefs can be quite strong. Indeed I would expect anti-Humeans to claim that, for instance, moral beliefs provide decisive reason for action, and thus that an instrumentally rational agent would by decisively motivated to act on its moral beliefs. In the second case, the kinds of beliefs typically taken to be motivating are beliefs about, e.g., what it would be good to do. And it’s hard to imagine an agent with any significant degree of intelligence lacking any beliefs of that kind. In the third case, I’m not at all convinced that a being which lacks functional analogues to beliefs and desires would actually be an agent in any meaningful sense. If Bostrum wants to maintain that the OT does not presuppose the Humean theory of motivation, then he must at least motivate the idea that his three cases could plausibly obtain. Yet, he does not do this.

Moreover, Bostrum claims that the OT does not come with controversial metaethical commitments because it’s not a claim about the relationship between rationality and motivation, but intelligence and motivation. As I indicated, we are never told exactly what ‘intelligence’ refers to. However, Bostrum does clearly think that it requires significant instrumental rationality. This need not mean, he claims, that a superintelligent AI would be rational in a ‘normatively thick’ sense (Bostrum 2014, 130). Here he has in mind the kind of rationality that Derek Parfit illustrated in his famous ‘future Tuesday indifference’ example (Parfit 1984). In this example, a hedonist is utterly indifferent to what happens to him on future Tuesdays, and so is willing to suffer terrible harms on future Tuesdays, in exchange for even trivial benefits on other days. He prefers to be tortured next Tuesday if it means not stubbing his toe today. Bostrum’s claim that a superintelligence need not be rational in this sense betrays the very point of Parfit’s example. The idea behind the example (or, at least one of the ideas) is that to be a cognitively efficacious agent — to be, as Bostrum puts it, skilled in prediction, planning, and general means-end reasoning — necessarily means being able to see a kind of contradiction in future Tuesday indifference. Bostrum offers no defense at all for his claim that an intelligent (much less superintelligent) agent, one capable of effectively making plans and realizing goals, could nevertheless be irrational in this ‘thick’ sense.

Even if Bostrum had satisfactory answers to these objections, it seems to me that his OT necessarily depends on at least one controversial metaethical claim. Bostrum as we’ve seen attempts to avoid metaethical commitments by restricting himself to claims about a superintelligence’s instrumental rationality. But, some metaethicists, call them constitutivists, think that normativity — including morality — is derivable from the constituents of instrumental rationality. Thus, on their view, acting immorally requires engaging in a certain kind of instrumental irrationality. A super-instrumentally rational being, on this view, would also be supermoral. Thus, in order to hold on to the OT — to claim that a superintelligent AI might have nearly any conceivable final goal — then Bostrum has to reject constitutivism. Again, however, he doesn’t even mention it.

The point of criticizing the OT is not to suggest that the thesis is indefensible. Rather, it is to show that the OT — like a number of claims in the book — has many controversial implications, and that it ought to receive a much more rigorously philosophical defense than it has thus far. Though it in no way settles many of the issues it raises, Bostrum’s book helpfully inaugurates several new debates, and illustrates ways in which some old debates are still relevant.

References

  • Bostrum, Nick 2014. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press.
  • Parfit, Derek 1984. Reasons and Persons. Oxford: Clarendon Press.
  • Putnam, Hillary 1981. Reason, Truth and History. Cambridge: Cambridge University Press.

--

--