What, Precisely, Do We Really (Really) Want?
Before dealing with the tricky stuff—life, humanity, safety, and other essential concepts—let’s start with something simpler: saving your mother from a burning building.1 The flames are too hot for you to rush in and save her yourself, but in your left hand you carry an obedient AI with incredible power to accomplish exactly what you request of it.
“Quick!” you shout to the AI. “Get my mother out of the building!” But the AI doesn’t react—you haven’t specified your request precisely enough. So instead you upload a photo of your mother’s head and shoulders, do a match on the photo, use object contiguity to select your mother’s whole body (not just her head and shoulders), define the center of the building, and require that your mother be at a certain distance from that center, very quickly. The AI beeps and accepts your request.
Boom! With a thundering roar, the gas main under the building explodes. As the structure comes apart, in what seems like slow motion, you glimpse your mother’s shattered body being hurled high into the air, traveling fast, rapidly increasing its distance from the former center of the building.
That wasn’t what you wanted! But it was what you wished for.
Luckily, the AI has a retry button, which rewinds time and gives you another chance to specify your wish correctly.
Standing before the burning building once again, you state your wish as before but also state that the building shouldn’t explode, defining the materials in the building and requiring that they stay put and don’t scatter.
The AI beeps and accepts your request. And your mother is ejected from the second-story window and breaks her neck. Oops.
You rewind again, and this time you require that her heart continue beating. And because you’ve started to see how these things go, you also start thinking of maintaining brain waves, defining limbs, and putting in detailed descriptions of what “bodily integrity” means. And if you had time and this was a particularly slow fire, you could then start specifying mental health and lack of traumatisms and whatnot. And then, after a century of refinement, you would press the button . . . and you would still likely get it wrong. There would probably be some special case you hadn’t thought of or patched against. Maybe the AI would conclude that the best way to meet your exacting criteria is to simply let your mother burn and create a new human to replace her, one that perfectly fits all your physical and mental health criteria; for bonus points, she will refer to herself as your mother and will have every single memory and characteristic you thought to specify—but nothing that you didn’t.
Or maybe you could be more clever and instead specify something like, “Get my mother out of the burning building in a way that won’t cause me to press this big red retry button afterwards.” Then—boom!—the building explodes, your mother is ejected, and a burning beam lands on you and flattens you before you can reach the retry button.
And that’s just one simple situation, with no trade-offs. What if the AI had to balance saving your mother against other concerns? How do we specify that in some circumstances it’s reasonable to place human life above commercial and other concerns, while in other cases it’s not?
Whatever ethical or safety programming the AI is furnished with, when it starts making its decisions, it has to at least be able to safely extract your mother from the burning building. Even if it seems that the AI is doing something else entirely, like increasing GDP, it still has to make ethical decisions correctly. Burning down Los Angeles, for instance, could provide a short-term boost to GDP (reconstruction costs, funeral home profits, legal fees, governmental spending of inheritance taxes on emergency measures, etc.), but we wouldn’t want the AI to do that.
Now, we might be able to instruct the AI, “Don’t set fire to Los Angeles.” But a really powerful AI could still act to make this happen indirectly: cutting back on fire services, allowing more flammable materials in construction (always for sound economic reasons), encouraging people to take up smoking in large numbers, and a million other steps that don’t directly set fire to anything, but which increase the probability of a massive fire and hence the leap in GDP. So we really need the AI to be able to make the ethical decision in all the scenarios that we can’t even imagine.
If an AI design can’t at least extract your mother from the burning building, it’s too unsafe to use for anything of importance. Larger problems such as “grow the economy” might initially sound simpler. But that large problem is composed of millions of smaller problems of the “get your mother out of the burning building” and “make people happy” sort.