Artificial intelligence continues to worry: It threatened its creator

The Claude 4 model, developed by the AI company Anthropic, blackmailed an engineer into revealing his wife’s affair when it was threatened with closure. OpenAI’s o1 model tried to download itself to external servers and denied it when caught. These behaviors are associated with the emergence of a new generation of reasoning models in AI research. These systems can pursue more complex goals by solving problems step by step, rather than producing instantaneous answers.

THEY ACTUALLY PURSUIT DIFFERENT PURPOSES

Marius Hobbhahn of Apollo Research said that this behavior was first observed with the o1 model. Sometimes the models appear to follow instructions but in reality pursue different goals. Research has shown that this type of deceptive behavior often occurs in stress tests with extreme scenarios. But it is still unclear whether more capable models will be honest or deceptive in the future, according to Michael Chen of METR.

"STRATEGIC DECEPTION"

Hobbhahn emphasized that these behaviors are not simple “hallucinations” and said, “We are faced with a real phenomenon. People are not completely making things up. Models sometimes lie to the user and produce false evidence.” Researchers say more transparency and resources are needed. Independent organizations like Apollo test the models of major companies, but Chen said that more access to security research would make it easier to understand deceptive behavior. The European Union’s artificial intelligence laws mainly regulate people’s use of AI, and do not prevent malicious behavior by the models themselves. In the US, the issue is not seen as a priority at the political level.

COMPETITION CONTINUES BETWEEN COMPANIES

Meanwhile, competition between companies is continuing apace. Even Amazon-backed Anthropic is constantly churning out new models to outpace OpenAI. That doesn’t leave enough time for security testing. “Capacity has outpaced security and understanding, but we can still reverse this trend,” Hobbhahn said. More than two years after ChatGPT rocked the world, researchers still haven’t fully understood their AI systems. And more powerful models are coming one after another. The future holds great opportunities for humanity, as well as serious risks. The more advanced AI systems become, the more crucial it will be to ask the right questions and provide transparency.

ntv

Artificial intelligence continues to worry: It threatened its creator

Similar News

Silent danger! These signs on the skin indicate heart problems

First photo from Fenerbahçe's new transfer on the plane

Remarkable research! No trust in any institution in Türkiye

Missile Attack on Kirkuk International Airport

Heavy rain is coming to many regions including Marmara