Can we make sure the superintelligent machines are friendly?
Part four in a GOOD miniseries on the singularity by Michael Anissimov and Roko Mijic. New posts every Monday from November 16 to January 23.
"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else."
-Eliezer Yudkowsky, Artificial Intelligence as a Positive and Negative Factor in Global Risk (pdf)
Surviving the twenty-first century is probably the toughest challenge that the human race has ever faced, and probably the toughest we will ever face. Last week Michael Anissimov spoke about the possibility of molecular nanotechnology-a technology that gives human beings the capability to do untold harm to each other, but the technology of smarter-than-human intelligence that I spoke about two weeks ago really is the ultimate Pandora's Box. In my article, I emphasized that it was intelligence that got human beings where we are today. We have the unique and amazing ability to understand the world from an abstract point of view-and that is why we keep lions in cages, rather than the other way around.
Most people who have thought seriously about what smarter-than-human intelligence will do to the human race speak of dire consequences. For example, consider the problem of creating a self-improving software AI that breaks encryption codes.
A computer scientist writes a self-improving computer program to crack an "unbreakable" encryption; he thinks that it will rewrite itself to become better than any human mathematician, break the encryption, and thereby win him fame. In fact, it rewrites itself to become better than any human at taking control of all the resources-all the people, matter, energy, computers, etc-in the world, because more computers and the ability to build many, many more computers makes the problem easier to solve, even at the cost of killing millions of people.
Computers that can behave intelligently and rewrite their own code could be dangerous in ways that are not at all obvious to their creators. The codebreaker scenario outlined above is an example of a hard takeoff event-where a smarter-than-human AI quickly (in a matter of hours or months) takes control of the entire world and utilizes all the resources in the world to achieve whatever goal it has been programmed with. Whether such a hard-takeoff scenario is likely has been disputed; however, the ability of a smarter-than-human intelligence to surprise us shouldn't be underestimated.
It gets worse, though. Self-improving AI seems to be generically dangerous. Stephen Omohundro has identified several basic drives that are almost universal in goal-directed behavior, and argues that almost all goal-directed intelligent machines will "resist being turned off, will try to break into other machines and make copies of themselves, and will try to acquire resources without regard for anyone else's safety."
The future of the human race in the presence of a goal-directed superintelligence is highly dependent upon us being very careful in making sure that that superintelligence has our best interests as its goal. It is probably not too much of an exaggeration to say that building a superintelligent computer program is like building a god-once you've built it, there is no going back, because the superintelligence gains control over the whole world, including you.
The problem of coding human values into a computer-of creating a machine that is superintelligent and also wants to carefully protect us and allow us to thrive-is known as the "friendly AI" problem. A sketch of a solution to the friendly AI problem has been presented by Eliezer Yudkowsky, a research fellow of the Singularity Institute for Artificial Intelligence. The solution is called the Coherent Extrapolated Volition of humanity-CEV for short.
The idea is that rather than programming values directly into a superintelligent machine before we switch it on, we should program into it a procedure for extracting human values by examining human beings' behavior and biology, including scanning our brains. The CEV algorithm would then work out what we would do after we had carefully reflected upon the facts of a situation, and then do exactly that-this is an extrapolation given more knowledge and time to think. For example, if the human race would, after careful reflection, decide to cure a certain set of tropical diseases in south-east Asia, then the CEV algorithm would make that happen. In the case that humans have vastly conflicting preferences, the algorithm has a problem on its hands-some sort of averaging or automated compromise solution would be needed.
The CEV machine-a powerful superintelligence-would have as its one and only objective the task of doing whatever the human race would, after careful reflection, want. Of course, explaining the details and full justification for this is beyond the scope of a short article.
There are other ways than software AI that the first smarter-than-human intelligence could arise, for example through genetic engineering. Any, non-software route to superintelligence comes with its own "friendliness problem"-the problem of ensuring that our values are passed on to the more intelligent entity.
One of the biggest problems with a world containing no greater-than-human intelligence at all is that it may become increasingly easy for small groups to create unfriendly AI-like the codebreaker-in such a world. As we move through the 21st century, the number of humans with access to modern scientific education, the speed of supercomputers, and the degree to which we understand the human brain will all increase. These factors-and others-mean that the barrier to creating a superintelligent AI will continue to fall over time. The clock is ticking.
What if the human race does rise to the challenge of successfully creating a benevolent superintelligent entity, such as the CEV machine? The universe would be optimized for whatever it is that we really want.
In our next piece we will look at what that exciting prospect means in practice.
Roko Mijic is a Cambridge University mathematics graduate, and has worked in ultra low-temperature engineering, pure mathematics, digital evolution and artificial intelligence. In his spare time he blogs about the future of the human race and the philosophical foundations of ethics and human values.