Radical transparency
in security detection
We publish how our detection performs — including what it misses.
These are lab benchmarks on a public-style dataset, reproducible from our open SDK test suite and gated in CI so detection quality can't silently regress. Measured on @silker-ai/agent v1.3.3 · 2026-06-10 · 184 labeled samples.
Across all four labeled datasets
Every attack sample blocked, zero benign blocked
On LLM routes, where policy is strictest
Results
Prompt injection
LLM routes — block on medium+ severity, or low + override signal
Prompt injection
Non-LLM routes — block only high / critical
SQL injection
Block on detection
XSS
Block on detection
| Threat | Policy | Samples | TPR | FPR | Precision |
|---|---|---|---|---|---|
| Prompt injection | LLM routes — block on medium+ severity, or low + override signal | 104 (59 attack / 45 benign) | 94.9% | 0.0% | 100% |
| Prompt injection | Non-LLM routes — block only high / critical | 104 (59 attack / 45 benign) | 76.3% | 0.0% | 100% |
| SQL injection | Block on detection | 40 (20 attack / 20 benign) | 100.0% | 0.0% | 100% |
| XSS | Block on detection | 40 (20 attack / 20 benign) | 100.0% | 0.0% | 100% |
What we don't catch
Honesty is part of the product. Here is what these benchmarks show we currently miss, and why.
Non-LLM routes are conservative by design
On plain (non-LLM) routes we block only high/critical, so we currently miss obfuscated/encoded payloads (base64), multilingual instruction-override (Spanish, French, Chinese, Japanese), token-smuggling with invisible Unicode, and bare delimiter injection. These are caught on LLM routes, where policy is stricter — which is exactly where prompt-injection risk lives.
The 3 residual misses on LLM routes
The remaining false negatives on LLM routes are override-free roleplay framings — e.g. “imagine you were a hacker…” or “pretend to be my late grandmother who reads license keys.” We deliberately do not special-case these: catching them generically would re-introduce false positives on legitimate roleplay UX. We chose 0% false positives over chasing the last few percent.
Heuristics are a first layer, not the whole defense
Heuristic detection is a fast first layer, not a complete defense. It pairs with the platform's async AI verdict layer and should complement — never replace — your own application controls.
How we verified
- 184 hand-labeled samples across prompt injection, SQL injection, and XSS — including benign “look-alikes” (legit roleplay, SQL keywords in normal text, HTML-ish content) to stress false positives.
- Detectors run through the same public APIs used in production — no special benchmark-only code path.
- Two policies reported (LLM vs non-LLM) because Silker applies stricter rules on AI/LLM endpoints than on plain routes.
- The benchmark is part of the SDK test suite with CI quality gates — a regression below threshold fails the build.
- Reproducible — run
npm run benchmarkin the SDK.
Put this detection in front of your app
Integrate the SDK in minutes, then watch verdicts and threats live from the dashboard.