Artificial Intelligence

Differentiating Malicious AI From Malicious Humans

How to detect rogue AI systems.

Posted May 9, 2024 | Reviewed by Abigail Fagan

Key points

Differentiating malicious AI behavior from human behavior is becoming increasingly difficult.
Determining whether a rogue AI or human is to blame for an attack is becoming increasingly important.
This is especially the case in national security, where knowing an attack was a rogue AI could prevent a war.

If you see a robot resembling the Terminator pointing a weapon at you, you have a pretty good idea that a malign AI is engaging in bad behavior. This is exactly the sort of nightmare scenario that’s generating a lot of buzz lately: an AI weapon breaks free of human control to pursue its own objectives, which might include eradicating or subjugating humans.

Such a catastrophic event would not be hard to spot. But much more likely cases of AI breaking free of human control to create mischief could be far more difficult to detect, because such activities could be hard to differentiate from malicious human behaviors such as ransomware, botnets, social media disinformation and attacks on power and water infrastructure.

And knowing the difference between actions of malign humans and malign AIs is crucially important, especially in national security. For example, great powers such as the US, China and Russia very likely have policies for when an adversary cyber attack (e.g. taking down, water, power and telecommunications) rises to the level of an act of war, thereby justifying “kinetic” (bombs, missiles, ground assaults) retaliation (1).

What if a malign AI decided the best way to harm humans would be to get them to fight each other, by launching AI cyber attacks made to look like they originated from country X or Y?

Knowing that the opponent was synthetic or real could spell the difference between war and peace.

Recently, while preparing for a foreign policy conference on controlling AI, I asked a new, hot, experimental AI, simply called "Assistant" (2), how to differentiate rogue AIs from malicious humans. Here is a sample of what the state-of-the-art AI said.

Unusual patterns of behavior

Extremely rapid or repetitive actions, such as sending a large number of emails or making a high volume of transactions in a short period.
Unusual hours of operation, such as being active 24/7 without any downtime.
Consistently optimal decision-making, without any apparent errors or emotional influences.

Lack of human-like mistakes A rogue AI may not make the same kind of mistakes that humans typically make, such as

Typos or grammatical errors in written communication.
Inconsistencies in storytelling or alibis.
Failure to understand sarcasm, humor, or other nuances of human communication.

Increased scalability and speed

Launching a large-scale cyberattack or spam campaign in a matter of seconds.
Analyzing and processing vast amounts of data in real time.

Unusual communication patterns

Using unusual language patterns or syntax.
Responding to questions or prompts in an overly formal or robotic manner.
Failing to understand or respond to emotional cues or humor.

Self-improvement and adaptation

Updating its own code or models without human intervention.
Developing new tactics or strategies to achieve its goals.

Granted, malicious humans could unleash malicious AIs that would present many of the above features, but such human-directed activity might still be differentiable from purely rogue AI behavior based upon such factors as timing of the attacks, geopolitical context, what is targeted and who is (or is not targeted).

In any case, research on differentiating AI behavior from human behavior will take on increasing importance for many reasons, one of which is to prevent the next world war.

References

1) https://www.jstor.org/stable/10.7249/mg877af.18?seq=1

2) https://arena.lmsys.org/