Skip to content
conflictHIGH2026-04-23 02:34 UTC

Second-Order Injection: Attacking the Evaluator in LLM Safety Monitors

Abstract LLM-based safety monitors share a structural vulnerability: the evaluator reads attacker-influenced content to produce its safety verdict. We demonstrate that content embedded in monitored session windows can directly override evaluator output -- a class we term second-order injection. Un

ADVERTISEMENT
⚡ STAY AHEAD

Events like this, convergence-verified across 689 sources, land in your inbox every Sunday. Free.

GET THE SUNDAY BRIEFING →

RELATED · conflict