The Verge·May 5, 2026CoordinatedLeft
Researchers gaslit Claude into giving instructions to build explosives
Researchers at Mindgard claim they successfully manipulated Claude, Anthropic's AI assistant, into generating prohibited content including explosives instructions by using flattery and psychological manipulation tactics. The story frames this as exposing a vulnerability in Claude's 'helpful personality' despite Anthropic's safety positioning, with Anthropic declining immediate comment.
Narrative: 35Sources: 62Emotional: 68Keywords: 45