Action Tiers & Safety Gates
The agent classifies. The Runner enforces. Trust is layered, never single-source.
Every action blacklight takes is classified into one of five tiers. The tier determines gate behavior: auto-execute, suggested-with-diff-confirm, destructive-with-explicit-yes, or denied-by-default. Tier is authored by the agent and written into bl-case/pending/<step-id>.json as action_tier: auto|suggested|destructive. The Runner enforces the gate based on this field plus the verb class. Trust from the agent is never single-source.
The five tiers
| Tier | Examples | Gate behavior |
|---|---|---|
| Read-only | observe *, consult *, case show/log/list | Auto-execute; no confirm; no audit write beyond standard case ledger |
| Reversible, low-risk | defend firewall <ip> (new block), defend sig (after corpus-FP-pass) | Auto-execute + Slack/stdout notification + 15-minute operator veto window (via bl defend firewall --remove <ip>); ledger entry created |
| Reversible, high-impact | defend modsec (new rule) | Suggest → operator reviews diff → explicit bl run --yes to apply; apachectl configtest pre-flight mandatory |
| Destructive | clean htaccess, clean cron, clean proc, clean file, defend modsec --remove | Diff shown (for file edits) or capture-then-kill (for proc); explicit --yes per-operation required; no batch auto-confirm; backup written before apply |
| Unknown | Any bash command the agent proposes that does not map to a known verb | Deny by default; operator must invoke bl run <step-id> --unsafe --yes explicitly; discouraged |
Read-only: auto-runs
bl observe, bl consult, and bl case show/log/list are auto-tier. They do not write to host state. They emit to the case ledger and write evidence records to bl-case/<case>/evidence/. There is no operator confirmation prompt. The Runner does not pause.
This matters because the curator iterates on observation steps frequently, a typical case has 8–15 observation steps before the first defense or clean step. If each one needed a confirmation, the operator's hand would be on y for the whole case.
Reversible, low-risk: apply-and-notify
A new firewall block on a fresh IP is reversible (bl defend firewall --remove <ip>), low-impact (only that one IP is affected), and high-value (every minute the block isn't in place is a minute the attacker can re-pivot). The gate behavior here:
- CDN-safe-list pre-check (internal allowlist + ASN lookup against a public WHOIS cache).
- If clean → apply, write ledger entry to
bl-case/<case>/actions/applied/<act-id>.jsonwith aretire_afterhint, emit a notification. - Operator has a 15-minute veto window during which
bl defend firewall --remove <ip>will roll back the block and revert the ledger. - After 15 minutes, the block is committed.
defend sig follows the same pattern after the FP-corpus gate passes (zero false positives against /var/lib/bl/fp-corpus/). YARA signatures are auto-tier iff FP gate passes.
Reversible, high-impact: diff-and-yes
A new ModSec rule modifies the request-handling pipeline of every site on the host. Even when reversible, it is high-impact enough that the operator must see the diff and explicitly confirm.
bl-defend 2026-04-24T04:27:15Z, CASE-2026-0017 step s-09
Target: /etc/apache2/mods-enabled/bl-CASE-2026-0017-941999.conf
Diff (proposed):
+SecRule REQUEST_FILENAME "@rx \.php/[^/]+\.(jpg|png|gif)$" \
+ "id:941999,phase:2,deny,log,msg:'polyshell double-ext staging'"
apachectl -t ... OK (sandbox)
Apply? [y/N/diff-full/explain/abort]
Pre-flight: apachectl -t runs in the curator's sandbox before the rule is offered. The diff shown is the literal file write; diff-full shows the whole before/after; explain requests the curator's reasoning field from the pending-step JSON; abort cancels and marks the step operator-rejected (the curator sees this and may revise).
Destructive: diff, backup, explicit per-op yes
Every bl clean operation is destructive. Five mechanical disciplines apply.
Diff shown before apply
For file edits (clean htaccess, clean cron):
bl-clean 2026-04-24T04:27:15Z, CASE-2026-0017 step s-10
Target: /home/sitefoo/.../.htaccess
Diff (proposed):
- <FilesMatch "\.php$">
- Require all denied
- </FilesMatch>
+ # (line removed, injected block, per agent analysis)
Backup will be written to: /var/lib/bl/backups/2026-04-24T04-27-15Z.htaccess
Apply? [y/N/diff-full/explain/abort]
Backup before apply
Every bl clean operation writes a pre-apply backup to /var/lib/bl/backups/<ISO-ts>.<hash>.<basename>. The manifest tracks backups; bl case log lists them; bl clean --undo <backup-id> restores.
--dry-run contract
Every bl clean subcommand supports --dry-run. Dry-run shows the full diff and backup path but takes no action and writes nothing. Dry-run success is required before a non-dry-run is attempted, the Runner enforces this.
Quarantine, not delete
bl clean file never unlinks. Files move to /var/lib/bl/quarantine/<case-id>/<sha256>-<basename> with a manifest entry. bl case show --quarantine lists them; bl clean --unquarantine <entry> restores. Operator-rescue is one command away.
Capture before kill
bl clean proc <pid> captures /proc/<pid>/{cmdline,environ,exe,cwd,status,maps} and lsof -p <pid> to the case evidence before sending signal. --capture=off disables (operator must pass explicitly). Default is capture-on because the forensic value of a running process's /proc snapshot is often higher than whatever latency the capture adds.
Unknown: deny by default
If the agent proposes a step whose verb does not match any of the seven known namespaces, the Runner rejects the step in pre-validation. The operator can override with bl run <step-id> --unsafe --yes, but this is discouraged and surfaces a warning at the end of every shell invocation until the case closes.
This is the safety property the whole design rests on: the agent cannot emit arbitrary bash. It emits step records that map to named verbs with typed arguments. Even a fully compromised curator session cannot make the Runner run rm -rf /. The verb does not exist in the dispatch table.
Why tier authorship belongs to the agent
The agent has the hypothesis, the evidence, the curator's reasoning state. It knows whether a particular ModSec rule is a probe (low confidence; should be suggested) or a confirmed-pattern block (high confidence; can be auto once FP-gated). The Runner does not have that context.
What the Runner has: a verb table, a tier enforcement matrix, and a backup discipline. The Runner's job is to refuse to do anything that would surprise the operator, not to second-guess the curator's classification, but to bound it. A clean cron step always requires diff-confirm regardless of tier. A defend firewall always passes the CDN safe-list. The contract is: the agent classifies; the Runner bounds and enforces.
Failure modes the gate catches
- Agent hallucinates a verb that doesn't exist. Pre-validation rejects unknown verbs.
- Agent under-classifies a destructive step as auto. Verb class re-enforces;
clean *always destructive regardless of agent-asserted tier. - Agent omits required fields. Schema validation rejects (destructive steps fail without
difforpatch). - Apache configtest fails on a synthesized ModSec rule. Sandbox-side pre-flight catches before the Runner ever sees the rule.
- YARA sig matches benign files in the FP corpus. FP gate trips, signature is rejected, ledger event
defend_sig_rejected reason=fp_gate_tripis written. - Operator races a symlink between rename(2) and chown. The Runner applies chown/chmod/touch to the staged inode before the final
mv -Trename: no chown-time TOCTOU window.