Generative AI’s Biggest Security Flaw Is Not Easy to Fix

-


It’s easy to trick the large language models powering chatbots like OpenAI’s ChatGPT and Google’s Bard. In one experiment in February, security researchers forced Microsoft’s Bing chatbot to behave like a scammer. Hidden instructions on a web page the researchers created told the chatbot to ask the person using it to hand over their bank account details. This kind of attack, where concealed information can make the AI system behave in unintended ways, is just the beginning.

Hundreds of examples of “indirect prompt injection” attacks have been created since then. This type of attack is now considered one of the most concerning ways that language models could be abused by hackers. As generative AI systems are put to work by big corporations and smaller startups, the cybersecurity industry is scrambling to raise awareness of the potential dangers. In doing so, they hope to keep data—both personal and corporate—safe from attack. Right now there isn’t one magic fix, but common security practices can reduce the risks.

“Indirect prompt injection is definitely a concern for us,” says Vijay Bolina, the chief information security officer at Google’s DeepMind artificial intelligence unit, who says Google has multiple projects ongoing to understand how AI can be attacked. In the past, Bolina says, prompt injection was considered “problematic,” but things have accelerated since people started connecting large language models (LLMs) to the internet and plug-ins, which can add new data to the systems. As more companies use LLMs, potentially feeding them more personal and corporate data, things are going to get messy. “We definitely think this is a risk, and it actually limits the potential uses of LLMs for us as an industry,” Bolina says.

Prompt injection attacks fall into two categories—direct and indirect. And it’s the latter that’s causing most concern amongst security experts. When using a LLM, people ask questions or provide instructions in prompts that the system then answers. Direct prompt injections happen when someone tries to make the LLM answer in an unintended way—getting it to spout hate speech or harmful answers, for instance. Indirect prompt injections, the really concerning ones, take things up a notch. Instead of the user entering a malicious prompt, the instruction comes from a third party. A website the LLM can read, or a PDF that’s being analyzed, could, for example, contain hidden instructions for the AI system to follow.

“The fundamental risk underlying all of these, for both direct and indirect prompt instructions, is that whoever provides input to the LLM has a high degree of influence over the output,” says Rich Harang, a principal security architect focusing on AI systems at Nvidia, the world’s largest maker of AI chips. Put simply: If someone can put data into the LLM, then they can potentially manipulate what it spits back out.

Security researchers have demonstrated how indirect prompt injections could be used to steal data, manipulate someone’s résumé, and run code remotely on a machine. One group of security researchers ranks prompt injections as the top vulnerability for those deploying and managing LLMs. And the National Cybersecurity Center, a branch of GCHQ, the UK’s intelligence agency, has even called attention to the risk of prompt injection attacks, saying there have been hundreds of examples so far. “Whilst research is ongoing into prompt injection, it may simply be an inherent issue with LLM technology,” the branch of GCHQ warned in a blog post. “There are some strategies that can make prompt injection more difficult, but as yet there are no surefire mitigations.”



Source link

Ariel Shapiro
Ariel Shapiro
Uncovering the latest of tech and business.

Latest news

Metadata Shows the FBI’s ‘Raw’ Jeffrey Epstein Prison Video Was Likely Modified

The United States Department of Justice this week released nearly 11 hours of what it described as “full...

Prime Day Deals on WIRED’s Top Air Fryer and Espresso Machine

Amazon Prime Day sales are timed for early summer—the time of enjoyment. The time you spoil yourself. Most...

Julie Wainwright joins Tech Zone Daily Disrupt 2025 in a fireside chat

Tech Zone Daily Disrupt 2025 returns to Moscone West in San Francisco from October 27–29, uniting over 10,000+...

Coffee! Coffee Now! Get Your Caffeine Fix With These Prime Day Deals

What’s more WIRED than coffee? Before you plug into the matrix, you need your coffee fix. We know...

You Should Buy Anker’s Laptop Power Bank Before This Deal Ends

I have tested hundreds of portable chargers over the years, but the Anker Laptop Power Bank ($94, down...

Humanoids, AVs, and what’s next in AI hardware at Disrupt 2025

Tech Zone Daily Disrupt 2025 hits Moscone West in San Francisco from October 27 to 29, bringing together...

Must read

You might also likeRELATED
Recommended to you