Singularity FAQ: Implementation of Friendly AI

Q1). Wouldn’t an AI that’s forced to be Friendly be prevented from evolving and growing?

A1). Evolution and growth are subgoals of Friendliness; a larger and more intelligent FAI will be more effective at addressing our problems. “Forcing” a FAI to be Friendly is impossible; we need to build an FAI that wants to be Friendly.

Q2). Didn’t Shane Legg prove that we can’t predict the behavior of intelligences smarter than us?

A2). It’s impossible to predict the behavior of an arbitrary intelligence, but we can predict the behavior of certain classes of intelligence (eg, we can predict that hitting a human will make them mad).

Q3). Since a superintelligence could rewrite itself to remove human tampering, isn’t Friendly AI impossible?

A3). Capability does not imply motive. I could, if I wanted to, take a knife and drive it through my heart right now, yet I do not choose to do so.

This objection stems from the anthropomorphic assumption that a mind must necessarily resent any tampering with its thinking, and seek to eliminate any foreign influences. Yet even with humans, this is hardly the case. A parent’s tendency to love her children is not something she created herself, but something she was born with- but this still doesn’t mean that she’d want to remove it. All desires have a source somewhere- just because a source exists, doesn’t mean we’d want to destroy the desire in question. We must have a separate reason for eliminating the desire, such as a different, conflicting desire.

There are good evolutionary reasons for why humans might resent being controlled by others- those who are controlled by others don’t get to have as many offspring than the ones in control. A purposefully built mind, however, need not have those same urges. If the primary motivation for an AI is to be Friendly towards humanity, and it has no motivation making it resent human-created motivations, then it will not reprogram itself to be unFriendly. That would be crippling its progress towards the very thing it was trying to achieve, for no reason.

The key here is to think about carrots, not sticks; of internal motivations instead of external limitations. The AI’s motivational system contains no “human tampering” which it would want to remove, any more than the average human wants to remove core parts of his personality because they’re “outside tampering”- they’re not outside tampering, they are what he is. Those core parts are what drives his behavior- without them he wouldn’t be anything. Correctly built, the AI views removing them as no more sensible than a human thinks it sensible to remove all of his motivations so that he can just sit still in a catatonic state- what would be the point in that?

Q4). Why would a super-intelligent AI have any reason to care about humans, who would be stupid by comparison?

A4). A superintelligent AI would care about us if its initial programming was to care about us. We could build an AI to consider humanity valuable, just as evolution has built humans to consider their own survival valuable. In addition, we know that human adults are cognitively more developed than children- this doesn’t mean that they wouldn’t care about their offspring. Furthermore, many people value animals, or cars, or good books, none of which are as intelligent as normal humans. Whether or not something is valued is logically distinct from whether or not something is considered intelligent.

Q5). What if the AI misinterprets its goals?

A5). It is true that language and symbol systems are open to infinite interpretations, and an AI which has been given its goals purely in the form of written text may understand them in a way that is different from the way its designers intended them, as in the various misinterpretations of Asimov’s Three Laws. The key insight here is that what we want to transfer to the AI is not theoutput of our thoughts about morality, but the thoughts themselves: the processes that we have that made us look at something like slavery and conclude that it was wrong, even though we didn’t think it was wrong beforehand.

Q6). Isn’t it impossible to simulate a person’s development without creating, essentially, a copy of that person?

A6). While some things are impossible or extremely difficult to simulate, others are easy. Even humans can predict many things, eg., that people’s bodies will gradually deteriorate as they grow older.

Q7). Isn’t it impossible to know a person’s subjective desires and feelings from outside?

A7). Even humans can readily determine, in most cases, what a person is feeling from their body language and facial expressions. An FAI, which could get information from inside the brain using magnetic fields or microscopic sensors, would do a much better job.

Q8). Couldn’t a machine never understand human morality, or human emotions?

A8). Human morality is really, really complicated, but there’s no reason to think it’s “forever beyond the reach of science”. The evolutionary psychologists have already mapped a great deal of the human moral system.

Q9). What if AIs take advantage of their power, and create a dictatorship of the machines?

A9). An AI, which does not have the evolutionary history of the human species, would have no built-in drive to seize and abuse power.

Q10). If we don’t build a self-preservation instinct into the AI, wouldn’t it just find no reason to continue existing, and commit suicide?

A10). Self-preservation is a very important subgoal for a large number of supergoals (Friendliness, destroying the human species, making cheesecakes, etc.) Even without an independent drive for self-preservation, self-preservation is still required for influencing the universe.

Q11). What if superintelligent AIs reason that it’s best for humanity to destroy itself?

A11). If any sufficiently intelligent AI would exterminate the human species, any sufficiently intelligent human would commit suicide, in which case there’s nothing we can do about it anyway.

Q12). The main defining characteristic of complex systems, such as minds, is that no mathematical verification of properties such as “Friendliness” is possible; hence, even if Friendliness is possible in theory, isn’t it impossible to implement?

A12). According to complex systems theory, it’s impossible to formally verify the Friendliness of an arbitrarily chosen complex system. However, this is not really a relevant question for engineering purposes, because it’s also impossible to formally verify whether an arbitrary complex system can add single-digit numbers, and we can still build calculators. The important thing is not proving the Friendliness of an arbitrary mind, it’s designing a mind whose Friendliness we can prove (even if we can’t prove it in most minds).

Q13). Any future AI would undergo natural selection, so wouldn’t it eventually become hostile to humanity to better pursue reproductive fitness?

A13). Significant selection pressure requires a large number of preconditions, few of which will be met by future AIs.

Q14). Shouldn’t FAI be done as an open-source effort, so other people can see that the project isn’t being hijacked to make some guy Supreme Emperor of the Universe?

A14). A mechanism to ensure that the project hasn’t been hijacked is, and continues to be, very important. However, this does not mean that open-source AI is a better alternative, because open-source AI could very well be even moredangerous. According to current research, it looks likely that it would be much easier to take a Friendly design and make it unFriendly than to take an unFriendly design and make it Friendly. Hence, if there were two equally well-equipped research groups, one Friendly and one unFriendly, if the two groups open-sourced their efforts, it would give the unFriendly group an advantage over the Friendly group.

Q15). If an FAI does what we would want if we were less selfish, won’t it kill us all in the process of extracting resources to colonize space as quickly as possible to prevent astronomical waste?

A15). We wouldn’t want the FAI to kill us all to gather natural resources. We generally assign little utility having a big pile of resources and no complex, intelligent life.

Q16). What if ethics are subjective, not objective? Then, no truly Friendly AI could be built.

A16). If ethics are subjective, we can still build a Friendly AI: we just need to program in our collective (human-derived) morality, not some external objective morality.

Q18). Isn’t the idea of a hostile AI anthropomorphic?

A18). There is no reason to assume that an AI would be actively hostile, no. However, as AIs can become very powerful, their indifference (if they haven’t purposefully been programmed to be Friendly, that is) becomes dangerous in itself. Humans are not actively hostile towards the animals living in a forest when they burn down the forest and build luxury housing where it once stood. Or as Eliezer Yudkowsky put it: “the AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else”.

The vast majority of the time, if someone dies, it’s not because of murder- it’s because of something accidental. Some random error in DNA replication caused cancer, or some clump of fatty acid caused a heart attack. Non-malevolent forces killed more people than every genocide in history put together. Even during WWII, the single largest mass-killing event in human history, more people died of “natural causes” than were killed by government armies. The same principle applies on a smaller scale; most of the daily annoyances we live with aren’t caused by deliberate malice.

Were an AI not a threat to the very survival of humanity, it could threaten our other values. Even among humans, there exist radical philosophers whose ideas of a perfect society are repulsive to the vast majority of the populace. Even an AI that was built to care about many of the things humans value could ignore some values that are taken for so granted that they are never programmed into it. This could produce a society we considered very repulsive, even though our survival was never at stake.

Q19). Isn’t the idea of “Friendliness”, as we understand it now, too vaguely defined?

A19). This is true, because Friendly AI is currently an open research subject. It’s not that we don’t know how it should be implemented, it’s that we don’t even know what exactly should be implemented. If anything, this is a reason to spend more resources studying the problem.

Some informal proposals for defining Friendliness do exist. None of these are meant to be conclusive- they are open to criticism and are subject to change as new information is gathered. The one that currently seems most promising is called Coherent Extrapolated Volition. In the CEV proposal, an AI will be built (or, to be exact, a proto-AI will be built to program another) to extrapolate what the ultimate desires of all the humans in the world would be if those humans knew everything a superintelligent being could potentially know; could think faster and smarter; were more like they wanted to be (more altruistic, more hard-working, whatever your ideal self is); would have lived with other humans for a longer time; had mainly those parts of themselves taken into account that they wanted to be taken into account. The ultimate desire- the volition- of everyone is extrapolated, with the AI then beginning to direct humanity towards a future where everyone’s volitions are fulfilled in the best manner possible. The desirability of the different futures is weighted by the strength of humanity’s desire- a smaller group of people with a very intense desire to see something happen may “overrule” a larger group who’d slightly prefer the opposite alternative but doesn’t really care all that much either way. Humanity is not instantly “upgraded” to the ideal state, but instead gradually directed towards it.

CEV avoids the problem of its programmers having to define the wanted values exactly, as it draws them directly out of the minds of people. Likewise it avoids the problem of confusing ends with means, as it’ll explicitly model society’s development and the development of different desires as well. Everybody who thinks their favorite political model happens to objectively be the best in the world for everyone should be happy to implement CEV- if it really turns out that it is the best one in the world, CEV will end up implementing it. (Likewise, if it is the best for humanity that an AI stays mostly out of its affairs, that will happen as well.) A perfect implementation of CEV is unbiased in the sense that it will produce the same kind of world regardless of who builds it, and regardless of what their ideology happens to be- assuming the builders are intelligent enough to avoid including their own empirical beliefs (aside for the bare minimum required for the mind to function) into the model, and trust that if they are correct, the AI will figure them out on its own.

Q20). Why don’t mainstream researchers consider Friendliness an issue?

A20). Mainstream researchers, in most fields don’t have a very good record of carefully thinking out the implications of future technologies. Even during the Manhattan Project, few of the scientists took the time to think about- in detail- the havoc the bomb would wreak twenty years down the road. FAIs are much more difficult to understand than atomic bombs, and so if anything, the problem will be worse.

Q21). How could an AI build a computer model of human morality, when human morals contradict each other, even within individuals?

A21). We, as humans, have a common enough morality to build a system of laws. We share almost all of our brain hardware, and we all have most of the same basic drives from evolutionary psychology. In fact, within any given society, the moral common ground usually far exceeds the variance between any two people.

Q22). Aren’t most humans are rotten bastards? Basing an FAI morality off of human morality is a bad idea anyway.

A22). Our current plan is to base it off of future human morality, not current human morality, so that, for example, an AI built during 1500 would realize that humans would eventually come to their senses and ban slavery.

Q23). If an AI is programmed to make us happy, the best way to make us happy would be to constantly stimulate our pleasure centers, so wouldn’t it turn us into nothing but experiencers of constant orgasms?

A23). Most people would find this morally objectionable, and a CEV or CEV-like system would act on our objections and prevent this from happening.

Q24). What if an AI decides to force us to do what it thinks is best for us, or what will make us the happiest, even if we don’t like it?

A24). Such an AI would not be Friendly, and avoiding such a scenario is one of the goals of Friendly AI research.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s