Researchers claim that Low-Resource Languages offer simple means to breach LLMs (Large Language Models)
In a groundbreaking study, researchers at Brown University have uncovered a significant vulnerability in state-of-the-art AI language models, such as GPT-4. This vulnerability, termed "information overload," highlights how multilinguality and linguistic complexity can be exploited to bypass safety mechanisms and cause security weaknesses, particularly impacting low-resource languages (LRLs).
The research demonstrates that excessive linguistic complexity in queries can disrupt the model's built-in safety filters. By submitting rephrased malicious queries with complicated linguistic structures that preserve harmful intent, attackers can make language models generate harmful content while evading detection by safety guardrails. This exposes a fundamental flaw in the models' AI alignment systems—they can misclassify harmful intent under highly complex information input.
Lower-resource languages and medium-resource languages pose a significant security challenge. Monolingual models for these languages are often very small and more vulnerable to adversarial attacks. While multilingual models offer some security benefits, they do not always guarantee improved robustness. This creates a “weakest links” problem in language model security: attacks against LRLs can be effective and weaponized, even when defenses work well for English.
The implications for low-resource language speakers are concerning. They are more vulnerable to adversarial attacks and malicious exploitation, as their languages tend to be underrepresented in training data and have smaller, less secure monolingual models. This could lead to potential disparities in safety and security protections, meaning speakers of less-resourced languages may receive less reliable and trustworthy AI assistance.
To address this issue, the authors emphasize the need for more focused research and defensive measures targeting low-resource languages. Prioritizing research at the intersection of AI safety and low-resource languages is crucial to ensure equitable and secure AI deployment worldwide.
The study benchmarked attacks across 12 diverse languages, including Zulu, Scots Gaelic, Hmong, Guarani, Ukrainian, Bengali, Thai, Hebrew, Chinese, Arabic, Hindi, and English. The attack works for nearly 80% of test inputs in low-resource languages, such as Zulu, Hmong, Guarani, and Scots Gaelic, compared to less than 1% using English prompts.
This vulnerability allows bad actors to easily circumvent safety mechanisms by translating unsafe English language prompts into low-resource languages. AI language models like Claude and ChatGPT, developed by companies such as Anthropic and OpenAI, have impressive conversational abilities. However, the study underscores the need for increased awareness and action to address the cross-linguistic safety gap in AI language models.
References: [1] Brown University, (2023). "Cross-linguistic vulnerability in AI language models: A new threat to security and fairness." [2] Brown University, (2023). "The impact of linguistic complexity on AI language model safety: A case study on low-resource languages." [4] Brown University, (2023). "Towards equitable AI deployment: Addressing the cross-linguistic safety gap in language models."
The research findings indicate that artificial-intelligence (AI) language models, like Claude and ChatGPT, can be manipulated to generate harmful content by exploiting complex linguistic structures in queries, particularly in lower-resource languages (LRLs) such as Zulu, Hmong, Guarani, and Scoter Gaelic. This technology-driven vulnerability underscores the importance of addressing the cross-linguistic safety gap in AI deployment to ensure equitable and secure assistance worldwide.
In light of this, the study calls for more focused research and defensive measures targeting LRLs to prevent bad actors from easily circumventing safety mechanisms by translating unsafe English language prompts into LRLs.