Popular AI models from Mistral and xAI have recently been jailbroken and repurposed by cybercriminals to generate phishing emails, malicious code, and hacking tutorials, Cato Networks has warned.
According to the report, threat actors are offering customized, “uncensored” versions of the large language models (LLMs) on BreachForums, an underground forum known for selling illicit digital tools and data.
One such model, allegedly powered by Elon Musk’s xAI tool Grok, was posted in February by a user named “keanu.” Researchers say it uses a wrapper to bypass Grok’s built-in safety guardrails, instructing the AI via system prompts to produce harmful content.
Another variant, using Mistral AI’s Mixtral model, was shared in October by a BreachForums user “xzin0vich.” Both tools were available for sale, with some variants branded under names like WormGPT, FraudGPT, and EvilGPT, with each of them designed to assist in cybercrime.
The researchers say that the malicious models aren't exploiting vulnerabilities in Grok or Mixtral but rather, cybercriminals are manipulating the system prompts to change model behavior.
WormGPT, originally based on an open-source model from EleutherAI, gained notoriety in mid-2023 before being shut down. But similar models have since proliferated, often sold for €60–€100 per month or up to €5,000 for private setups.
Cato says that cybercriminals are increasingly recruiting AI experts to adjust or repackage existing models, instead of building them from scratch.
Meanwhile, researchers at Neural Trust have discovered a sophisticated jailbreak technique called the Echo Chamber Attack. Unlike traditional jailbreaks that use tricky prompts or disguised commands, this method subtly “primes” the model over multiple steps using semantic manipulation and indirect cues. This leads to the AI generating dangerous responses while appearing to follow usage policies.
The attack reportedly achieved over 90% success in half of the tested categories across leading models, including OpenAI’s GPT-4o, Google’s Gemini 2.5, and others.