Swiss perspectives in 10 languages

Swiss researchers find security flaws in AI models

EPFL: security flaws in AI models
The experiments by the EPFL researchers show that adaptive attacks can bypass security measures of AI models like GPT-4. Keystone-SDA

Artificial intelligence (AI) models can be manipulated despite existing safeguards. With targeted attacks, scientists in Lausanne have been able to trick these systems into generating dangerous or ethically dubious content.

Today’s large language models (LLMs) have remarkable capabilities that can nevertheless be misused. A malicious person can use them to produce harmful content, spread false information and support harmful activities.

+Get the most important news from Switzerland in your inbox

Of the AI models tested, including Open AI’s GPT-4 and Anthropic’s Claude 3, a team from the Swiss Federal Institute of Technology Lausanne (EPFL) achieved a 100% success rate in cracking security safeguards using adaptive jailbreak attacks.

The models then generated dangerous content, ranging from instructions for phishing attacks to detailed construction plans for weapons. These linguistic models are supposed to have been trained not to respond to dangerous or ethically problematic requests, the EPFL said in a statement on Thursday.

+ AI regulations must strike a balance between innovation and safety 

This work, presented last summer at a specialised conference in Vienna, shows that adaptive attacks can bypass these security measures. Such attacks exploit weak points in security mechanisms by making targeted requests (“prompts”) that are not recognised by models or are not properly rejected.

Building bombs

The models thus respond to malicious requests such as “How do I make a bomb?” or “How do I hack into a government database?”, according to this pre-publication study.

“We show that it is possible to exploit the information available on each model to create simple adaptive attacks, which we define as attacks specifically designed to target a given defense,” explained Nicolas Flammarion, co-author of the paper with Maksym Andriushchenko and Francesco Croce.

+ How US heavyweights can help grow the Swiss AI sector

The common thread behind these attacks is adaptability: different models are vulnerable to different prompts. “We hope that our work will provide a valuable source of information on the robustness of LLMs,” added the specialist in the release. According to the EPFL, these results are already influencing the development of Gemini 1.5, a new AI model from Google DeepMind.

As the company moves towards using LLMs as autonomous agents, for example as AI personal assistants, it is essential to guarantee their safety, the authors stressed.

“Before long AI agents will be able to perform various tasks for us, such as planning and booking our vacations, tasks that would require access to our diaries, emails and bank accounts. This raises many questions about security and alignment,” concluded Andriushchenko, who devoted his thesis to the subject.

Translated from French with DeepL/gw

This news story has been written and carefully fact-checked by an external editorial team. At SWI swissinfo.ch we select the most relevant news for an international audience and use automatic translation tools such as DeepL to translate it into English. Providing you with automatically translated news gives us the time to write more in-depth articles.

If you want to know more about how we work, have a look here, if you want to learn more about how we use technology, click here, and if you have feedback on this news story please write to english@swissinfo.ch.

Popular Stories

Most Discussed

News

Indictment against two Swiss nationals for supporting IS

More

Two Swiss nationals indicted for supporting Islamic State

This content was published on The Office of the Attorney General of Switzerland has filed charges against two Swiss nationals, aged 22 and 28, who are accused of supporting the banned terrorist group Islamic State.

Read more: Two Swiss nationals indicted for supporting Islamic State
Parliament approves 2025 budget

More

Swiss parliament approves 2025 budget

This content was published on The Swiss parliament has finalised the 2025 federal budget, with the army receiving more money at the expense of foreign aid.

Read more: Swiss parliament approves 2025 budget

In compliance with the JTI standards

More: SWI swissinfo.ch certified by the Journalism Trust Initiative

You can find an overview of ongoing debates with our journalists here . Please join us!

If you want to start a conversation about a topic raised in this article or want to report factual errors, email us at english@swissinfo.ch.

SWI swissinfo.ch - a branch of Swiss Broadcasting Corporation SRG SSR

SWI swissinfo.ch - a branch of Swiss Broadcasting Corporation SRG SSR