Meta Alignment: Communication Wack-A-Mole

When warning the public about the dangers of superintelligent AI, a common question is how,exactly, an unaligned superintelligence could kill humanity. An AI is just a program running on a computer, after all. Wouldn’t it need arms and legs and nukes and possibly a military to kill all of us? In fact, why would it kill humanity if it needs us to feed it energy and information?

At this point, the respondent will usually choose one of two possible responses. The first response is that we can’t know exactly how a superintelligent AI will kill us or survive without us, because it will be smarter than us. This is a sensible answer on the surface- any path to destruction a superintelligence could take would either be something humanity would anticipate- in which case the superintelligence would know we anticipated it ahead of time and could contrive ways of obfuscating its plan- or the path would be something we could not anticipate at all- in which case it would be difficult for us to detect what the superintelligence is doing before it is too late.

To many people, however, this response feels like a cop-out. To someone already skeptical that AI could pose a danger to humanity, it is telling that you won’t give a plausible-sounding scenario where an AI destroys humanity. The skeptic will definitely not trust when you say “the superintelligence will figure out a way, and it will be so smart that you and I can’t even comprehend it.” They have no reason to take such an argument on faith.

The second usual response is to give several plausible-sounding scenarios that the audience may be able to conceptualize. You might say, for instance, that the AI could socially engineer a lab worker into synthesizing a dangerous protein chain that starts a global pandemic (or a nanite factory, if you aren’t worried about scaring your audience away with science-fiction scenarios.) You might say that the AI could hack into weapons systems and deploy all of the world’s missiles. You might say the AI could persuade us into building it a robot army for our own protection.

In this situation, the danger is that many reasonable people will start proposing security patches. “So then, in order to remain safe from AI, we should implement safety protocols and strict regulations in protein synthesis labs, we should keep weapon systems separate from the internet, and we should ban AI controllable robot armies.” At this point, it may be that politicians put these caveats into AI safety bills, draw up a couple of treaties, and call it a day.

Any security expert should be pulling out their hair at the idea of implementing a couple of patches based off one or two possible breaches, and then calling it a day. In the real world, any complex system will have a multitude of possible points of failure. When you patch one point of failure, another one will spring up, and you end up playing an endless game of wack-a-mole.

One potential solution is to provide an example or two of how AI could kill us, but add a caveat that this is not a comprehensive list, and that a superintelligence will see possibilities we have missed. This solution runs into a soundbite problem. When you add any kind of nuance to your argument, you lose your audience. This is especially true in an era where your soundbite will be put into a clip which has been drastically cut to the size of a tik-tok video or tweet.

Another solution sidesteps the entire problem by wrapping the answer up in an analogy. The most common analogy is that of a novice chess player against a world champion chess player. A novice chess player can be reasonably certain the world champion will beat them at chess, but they cannot say exactly what moves the world champion will use to beat them. If they could, then they would be as good at chess as the world champion.

The main problem I have with this analogy is how narrow a chess game is. A novice chess player and a world champion are confined to the chess board and the legal moves each chess piece can make, and both players have the same goal- to win the match. Even worse, the novice chess player, while not able to anticipate the champion’s moves, can be reasonably confident that the champion’s goal is to win the match. However, a human cannot be sure of a superintelligence’s goal, and the superintelligence may make moves humans did not know were possible or go outside of the boundaries of what humans understand. Humans have a problem of imagination when it comes to conceiving a truly alien mind.

In an attempt to overcome this limitation, I often use the analogy of the goldilocks zone that Earth inhabits within our solar system. The space of possible environments in the universe is unfathomably vast, and the space we can inhabit is miniscule in comparison. Humans need to live in the temperature space that allows liquid water, yet remain shielded from radiation that may harm our DNA. We cannot be subject to the crushing gravity of larger planets, let alone stars or black holes. Even on the surface of our tiny, improbable world, we need shelter from storms that, in the cosmic sense, are extremely mild. We need access to food sources and protection from competing life forms. Common elements and molecules acts as poisons that harm our fragile biology. Now, given the narrow space humans can inhabit, imagine a mind of unfathomable vastness, and ask yourself how likely the conditions for human survival would play a prominent role in the vast space of that alien mind.

The goldilocks-zone analogy loses a lot of people, because the vast universe is something we can’t imagine. Our brains are adapted to our tiny space within the universe. Using this analogy will only sway those already captivated by science.

The audience of people captivated by science certainly exists, so it’s best to be armed with the arguments most likely to persuade them. When it’s possible, know your audience. Research the platform you’re using and familiarize yourself with its prior content to see how arguments are usually framed. If, for example, you’re writing a post on a science blog, then the goldilocks-zone argument is likely to be well received. If the platform is for people skeptical of big tech, however, then the goldilocks-zone argument will lose your audience, and an argument presenting AI-risk as the latest potential environmental disaster made by irresponsible corporations will be better received. (This is an honest argument- a rogue superintelligence killing everyone on earth in pursuit of resources is the biggest environmental disaster I can imagine.)

If you’re trying to reach a broader audience, you cannot pick and choose your arguments. You are left with the difficult task of presenting as broad an argument as possible while also avoiding anything that will alienate a part of your audience. Broadness requires nuance to remain true, but any nuance will be carved up in ways you cannot control and twisted into meanings you never intended. In this case, the only thing you can do is to construct what I call a fractal argument – an argument that can be taken apart sentence by sentence, and in which each sentence can stand alone as its own argument.

Example of a fractal argument: “The integration of computers in every modern system makes us particularly vulnerable to a rogue AI. Almost all modern infrastructure is either connected to the internet, or connected to people who are connected to the internet. The more we try to safeguard any one system, the more an intelligent AI will adapt and find novel ways to attack us. A superintelligent AI could potentially attack us through our infrastructure using technology known or unknown to us. It is safer and easier for us to pause AI development than to try to safeguard every aspect of the unimaginably complex infrastructure we’ve built.”

Such arguments are difficult to construct and redundant to the point of sounding like political waffle, which is why they are a last-resort. It is better, overall, to rely on volume- make as many true arguments as you can, in as many places as possible, and get all of your friends to do the same. Add nuance. Do the thing. Don’t shut up. Pierce all the information bubbles you can- to reach into places you normally would not go. Find your audience. The arguments will be carved up and misunderstood, certainly, but the volume will eventually drive the point home. Every time someone says “but—” they will encounter 3 different counterarguments. Saturation will, hopefully, be reached.

Ex Machina Revealed

Meta Alignment: Communication Wack-A-Mole

Leave a comment Cancel reply

Meta Alignment: Communication Wack-A-Mole

Share this:

Leave a comment Cancel reply