Detection Benchmarks

Radical transparency
in security detection

We publish how our detection performs — including what it misses.

These are lab benchmarks on a public-style dataset, reproducible from our open SDK test suite and gated in CI so detection quality can't silently regress. Measured on @silker-ai/agent v1.3.3 · 2026-06-10 · 184 labeled samples.

0%
False positives

Across all four labeled datasets

100%
SQLi & XSS detection

Every attack sample blocked, zero benign blocked

94.9%
Prompt-injection detection

On LLM routes, where policy is strictest

Results

Detection rate (TPR)False-positive rate (FPR)
Detection benchmark summary: detection rate, false-positive rate, and precision per threat and policy.
ThreatPolicySamplesTPRFPRPrecision
Prompt injectionLLM routes — block on medium+ severity, or low + override signal104 (59 attack / 45 benign)94.9%0.0%100%
Prompt injectionNon-LLM routes — block only high / critical104 (59 attack / 45 benign)76.3%0.0%100%
SQL injectionBlock on detection40 (20 attack / 20 benign)100.0%0.0%100%
XSSBlock on detection40 (20 attack / 20 benign)100.0%0.0%100%

What we don't catch

Honesty is part of the product. Here is what these benchmarks show we currently miss, and why.

Non-LLM routes are conservative by design

On plain (non-LLM) routes we block only high/critical, so we currently miss obfuscated/encoded payloads (base64), multilingual instruction-override (Spanish, French, Chinese, Japanese), token-smuggling with invisible Unicode, and bare delimiter injection. These are caught on LLM routes, where policy is stricter — which is exactly where prompt-injection risk lives.

The 3 residual misses on LLM routes

The remaining false negatives on LLM routes are override-free roleplay framings — e.g. “imagine you were a hacker…” or “pretend to be my late grandmother who reads license keys.” We deliberately do not special-case these: catching them generically would re-introduce false positives on legitimate roleplay UX. We chose 0% false positives over chasing the last few percent.

Heuristics are a first layer, not the whole defense

Heuristic detection is a fast first layer, not a complete defense. It pairs with the platform's async AI verdict layer and should complement — never replace — your own application controls.

How we verified

  • 184 hand-labeled samples across prompt injection, SQL injection, and XSS — including benign “look-alikes” (legit roleplay, SQL keywords in normal text, HTML-ish content) to stress false positives.
  • Detectors run through the same public APIs used in production — no special benchmark-only code path.
  • Two policies reported (LLM vs non-LLM) because Silker applies stricter rules on AI/LLM endpoints than on plain routes.
  • The benchmark is part of the SDK test suite with CI quality gates — a regression below threshold fails the build.
  • Reproducible — run npm run benchmark in the SDK.
Measured on @silker-ai/agent v1.3.3 on 2026-06-10. We'll update these results as the SDK evolves.

Put this detection in front of your app

Integrate the SDK in minutes, then watch verdicts and threats live from the dashboard.