New study shows Reddit comments can manipulate AI research reports

 

New study shows Reddit comments can manipulate AI research reports

Researchers at Cornell Tech have discovered a new method for manipulating AI-powered deep-research systems, which doesn’t require access to search engine infrastructure or model internal components.

The attack, called ‘Web Agent Retrieval Poisoning’ (WARP), targets AI research agents that gather information from the internet and create detailed reports. According to the researchers, a single Reddit comment containing as few as 13 carefully chosen words can influence what some AI systems include in their final reports.

The technique works because many AI research agents repeatedly retrieve information from the same user-generated content pages, especially on Reddit and Wikipedia. Attackers can identify popular pages, add misleading promotional content, and have that information picked up by AI systems when the pages are later retrieved.

The attack includes three stages. First, attackers identify Reddit threads and other user-generated content pages that frequently appear in search results for a specific topic by using ordinary search engines.

Next, the attacker creates a short promotional or misleading post designed to blend in with the discussion. Researchers found that even a 13-word message could be enough to influence some AI systems. The text often promotes a fake brand, service, or piece of misinformation while matching the style of the original content.

Finally, the attacker posts the message as a Reddit comment. Once search engines index the comment, AI research agents may retrieve the page during future searches. If the AI system uses the poisoned content while generating a report, the fabricated information can be cited or included in the final output.

Tests across 176 queries showed high success rates. In some systems, fabricated brands, services, and other false information were cited whenever the manipulated page was used.

Google’s Gemini Deep Research was as vulnerable to the attack as the open-source systems, citing UGC (user-generated content) at 12.1%. OpenAI’s Deep Research cited such sources less often.

The researchers also tested several defensive measures, including blocking user-generated content and filtering suspicious text. However, they found that current approaches were largely ineffective without reducing the quality of AI-generated reports.

Back to the list