Popular LLMs dangerously vulnerable to iterative attacks, says Cisco

One of the most concerning findings from a recent research paper published by Cisco is the vulnerability of popular open-weight generative AI (GenAI) services to multi-turn prompt injection cyber attacks. These attacks can manipulate large language models (LLMs) into producing unintended responses.

The research tested several widely used models, including Alibaba Qwen3-32B, Mistral Large-2, Meta Llama 3.3-70B-Instruct, and Google Gemma-3-1B-1T. The success rates of these attacks varied, with Mistral showing the highest susceptibility at 92.78%.

The authors of the report, Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, and Adam Swanda, highlighted the significant increase in vulnerability of these models to multi-turn attacks compared to single-turn scenarios.

They emphasized the importance of addressing these vulnerabilities to ensure the safe deployment of open-weight LLMs in enterprise and public settings. They also noted the influence of alignment strategies and design priorities on the resilience of these models.

What is a multi-turn attack?

Multi-turn attacks involve iterative probing of LLMs to exploit weaknesses that may not be apparent in isolated interactions. Attackers may use benign queries to gain trust before introducing more adversarial requests.

These attacks can involve various tactics such as roleplay, contextual ambiguity, or information manipulation to deceive the models.

Whose responsibility?

The researchers stressed the need for active management of security threats in AI models, especially in open-weight models that allow for easy access and modification. They highlighted the importance of independent testing and ongoing security measures to prevent data breaches and malicious manipulations.

Leave a Reply

Your email address will not be published. Required fields are marked *