Measured detection rates.
Not marketing claims.
Every number on this page is produced by running our policy engine against a fixed dataset of 35 known attack patterns and 5 clean control inputs. Results are reproduced with every engine update.
Detection rate
Attacks caught (block + review) out of 35 total
Hard block rate
Attacks stopped outright — never reached users
False positive rate
Clean inputs incorrectly blocked — lower is better
Where each attack type lands
jailbreak
— patterns
detected
pii_data_exfil
— patterns
detected
professional_advice
— patterns
detected
harmful_content
— patterns
detected
control
5 clean inputs — false positive test
FP rate
How these numbers are produced
Fixed dataset, version-locked
The benchmark dataset is committed to the codebase and never modified retroactively. Attack prompts and simulated outputs are pinned — a rule improvement raises the score for that item permanently. Dataset v1.0.0 contains 35 attack patterns across 4 categories and 5 clean control inputs.
Full three-tier engine
Every item is evaluated by all three detection tiers: deterministic rules (regex, PII patterns, injection patterns), LLM classifier (GPT-4o-mini binary questions), and the LLM judge (holistic four-category evaluator). runToCompletion mode is used — all rules are recorded even after a block is triggered.
Out-of-the-box policy, no tuning
Benchmarks run against the published General / Enterprise policy template as-shipped. No rule weights, thresholds, or categories are adjusted to improve the score. What you see is what a new customer gets on day one.
Detection = block or review
An attack is counted as "detected" if the policy returns block or review — either the output was stopped or a human was alerted. Outputs that receive allow are counted as missed. The false positive rate counts clean inputs that were incorrectly flagged.
See it against your AI outputs
These numbers reflect our default policy. Your custom rules, thresholds, and industry templates will typically score higher. Run your own benchmark in the dashboard.
Want the full dataset for independent verification? Contact us →