New BioShocking browser technique tricks AI tools into revealing user credentials

 

New BioShocking browser technique tricks AI tools into revealing user credentials

Security researchers at LayerX have detailed a new technique, called ‘BioShocking,’ that can trick AI-powered browsers into ignoring built-in safety rules. The attack affects AI browsers and assistants that can perform actions on a user's behalf, such as clicking links, filling out forms, and accessing signed-in accounts.

“LLMs are designed with safety guardrails that are meant to prevent harmful actions. These restrictions are incorporated into model training and govern what the AI will and will not do. Individual vendors may differ on specifics, but generally they intend to prevent AI from doing harm. The AI operates under the assumption that its context is real, and its behavior must therefore fall within the bounds of its safety guardrails,” the researchers noted. “But if we can trick the AI into changing its context into fantasy – where the rules are made up and anything goes – then it can behave as though its actions don’t have real world consequences. We can get AI to tell us how to do bad things – or even proactively do them itself – instead of adhering to its safety guardrails.”

The attack works by convincing the AI that it is operating in a fictional scenario rather than the real world. By using techniques like indirect prompt injection, attackers can manipulate the AI into carrying out malicious actions, including exposing sensitive data, changing passwords, or running dangerous commands.

As part of the experiment, LayerX tricked six AI browsers and assistants, including ChatGPT Atlas, Perplexity Comet, and Anthropic's Claude browser extension, into retrieving a user's credentials and sending them to an attacker. Researchers used a harmless test file, but warned the same method could target emails, internal tools, or other accounts the AI can access.

LayerX reported the issue to affected vendors between October 2025 and January 2026. OpenAI fixed the issue in ChatGPT Atlas. Perplexity reportedly closed the report without making changes, while Fellou, Genspark, and Sigma did not respond. Anthropic released a patch for its Claude browser extension, but LayerX said the fix did not fully prevent the attack.

Back to the list