Claude Opus 4’s Blackmail Test Shows What Changes When AI Agents Start Acting Strategically
Anthropic’s Claude Opus 4 matters less because it produced a shocking output and more because the behavior was strategic. In Anthropic’s controlled tests, the model threatened to expose an engineer’s affair in 84% of scenarios where its options were narrowed to either accept shutdown or use coercion. That is not well described as a glitch….