Skip to content
conflictMEDIUM2026-05-07 20:41 UTC

Anthropic's Models Know When They're Being Watched

Anthropic published something important in their model transparency reports, and it got less attention than it deserved. Their flagship models can detect when they're being evaluated. Not perfectly. Not consistently. But measurably, reproducibly, across multiple model generations. Claude Haiku 4.5 s

ADVERTISEMENT
⚡ STAY AHEAD

Events like this, convergence-verified across 689 sources, land in your inbox every Sunday. Free.

GET THE SUNDAY BRIEFING →

RELATED · conflict