Skip to main content
operating·Doc 08 of 13

Evidence Format

JSONL on the wire, bundle shape, summary.md convention. How raw logs reach the curator without raw logs reaching the curator.

Evidence Format

JSONL on the wire. Bundles are tar+gzip with a summary the curator reads first. Raw logs never enter context directly.

Every bl observe output is JSONL, one record per line. Fields vary by source but share a common preamble. Bundles compose the per-source JSONL streams, a summary.md first-read, and a MANIFEST.json for verification.

Evidence flow: collectors emit JSONL to bl-case/evidence; bl observe bundle packs them into a tar+gzip with summary.md and MANIFEST.json; the curator hot-attaches the bundle, reads summary.md first, and tool-uses jq, grep, or duckdb against the JSONL members

JSONL on the wire

{"ts":"2026-04-24T04:17:08Z","host":"example-host","source":"apache.transfer","record":{"client_ip":"203.0.113.42","method":"POST","path":"/pub/media/catalog/product/.cache/a.php","status":200,"path_class":"php_in_cache","is_post_to_php":true}}

Common preamble fields:

  • ts: ISO-8601 UTC timestamp of the event the record describes
  • host: the host this record was collected on
  • source: the collector that emitted it (apache.transfer, modsec.audit, fs.mtime-since, etc.)
  • record: source-specific structured data

Apache transfer

Base fields: client_ip, method, path, status, bytes, ua, referer, site.

Plus derived fields the Runner computes pre-emit:

  • path_class: one of php_in_cache, polyglot, static, admin, vendor, unknown
  • is_post_to_php: bool
  • status_bucket: 2xx, 3xx, 4xx, 5xx for stream-level histograms

ModSec audit

A/B/F/H/Z section walker output. Fields: txn_id, client, uri, rule_id, action, phase, timestamp, matched_var, matched_value, severity, msg.

Filesystem

Two collectors share the fs source:

  • fs.mtime-clusterrecord: { path, mtime, ext, cluster_id, cluster_size }
  • fs.mtime-sincerecord: { path, mtime, ext }

Cron

record: { user, system, line_n, raw, decoded, has_ansi_escape }.

The decoded field shows what the line looks like after cat -v reveals ANSI ESC[2J escape sequences attackers use to obscure cron entries from crontab -l.

Process

record: { pid, user, ps_argv, exe_basename, argv_spoof }.

argv_spoof: true when argv[0] from ps -u differs from /proc/<pid>/exe basename, the gsocket persistence-class signal.

Bundle shape

Evidence bundles (for bl consult --upload) are tar + gzip -5 (or zstd -3 if available). The archive name is bundle-<host>-<window>.tgz. Members:

  • MANIFEST.json — host, window, sources, sha256s, bl version
  • summary.md — 1–2 KB first-read; top IOCs, counts, hot paths
  • transfer.log.jsonl — pre-parsed Apache / nginx access records
  • modsec_audit.jsonl — pre-parsed ModSec audit events
  • fs_anomalies.jsonl — mtime clusters, perm drift, suid changes
  • system_messages.jsonl — journalctl extracts

MANIFEST.json carries every per-file sha256 plus the bl version that produced the bundle. Verification on the curator side: the Runner attaches the manifest to the upload event; the curator's first action is to read summary.md and confirm the manifest's record count matches the JSONL.

summary.md: the first-read convention

The first file the agent reads. ≤ 2 KB. Structured:

# Evidence bundle: <host> | <from> → <to>

## Trigger
<one-paragraph description of the artifact that prompted collection>

## Top-line findings
- <bullet list of ≤ 7 facts>

## Jump points
- <jq/grep expressions the agent can use to drill into the JSONL files>

## Attention-worthy
- <anomalies the pre-parse flagged>

The "Jump points" section is the key invention. Rather than dumping the whole bundle into context, the Runner pre-computes the queries that matter: "200s to PHP files in /pub/media/catalog/product/.cache/", "ModSec rule 920450 hits clustered around obs-0001 ts ± 90s". The curator picks one or two and tool-uses grep, jq, or duckdb to drill in. The bundle is hot storage, not context.

Why JSONL, not a binary format

Three reasons:

  1. Human-readable in the case ledger. bl case log is cat-able. Investigators reviewing a closed case see structured records, not opaque blobs.
  2. grep and jq-native. The curator's tool-use is pre-existing primitives. No custom parser. No schema-versioning headaches across bl releases.
  3. Streaming-friendly. Large collections (50k Apache lines) write incrementally. The Runner does not load the file before emitting it.

Compression

  • Default: gzip -5: portable to CentOS 6 / bash 4.1 baseline without EPEL.
  • Upgrade path: zstd -3 if command -v zstd succeeds: ~1.3× smaller, faster compress.
  • Detection: bl collect picks best available codec; extension is .tgz regardless (tar magic-byte detects codec on the decompress side).

Sonnet 4.6 bundle summary

Where the heavy lifting actually happens. Sonnet 4.6 only renders the summary.md first-read on a single bundle. Every load-bearing reasoning step in an investigation, cross-stream correlation, hypothesis revision, defensive-payload authorship, sample intent reconstruction, brief writing, runs in the Opus 4.7 curator session at 1M context. The curator absorbs the full case state (mounted skills + reference files, the bl-case memstore, every per-case Files bundle, every prior step result) without a retriever or chunker. Sonnet here is a fast, cheap condenser that hands the curator a 2 KB index plus jump-point queries; the actual drill-down happens via the curator's tool-use over grep / jq / duckdb against the JSONL files. See Architecture · model assignments and PRD §5.1 for the full routing.

summary.md generation runs through Sonnet 4.6 by default, bl_messages_call to the Messages API with prompts/bundle-summary-system.md as the system prompt. Sonnet treats log content as untrusted, produces a ≤ 2 KB output budget, formats jump-points and attention-worthy sections. No anthropic-beta header; this is a plain /v1/messages call outside the Managed Agents surface.

Two bypasses keep the Runner deterministic:

  • --no-llm-summary: skip Sonnet, fall back to deterministic _bl_obs_render_summary_deterministic.
  • BL_DISABLE_LLM=1 env var: same effect, scoped to the shell. Tests use this. Cost-controlled environments use this.

If Sonnet returns 401 / 5xx / 429, the Runner falls back automatically. Bundle creation never blocks on a Messages API outage.

Stress test bundle

exhibits/fleet-01/ carries a deterministic, byte-identical, ~360k-token APSB25-94 forensic bundle (apache + modsec + fs + cron + proc + journal + maldet) with attack needles buried in realistic noise. The bundle is regeneratable from tools/dev/synth-corpus.sh --seed 42. Sources are documented; no operator-local data ever lands in the bundle.

This bundle exercises the full 1M-context curator turn, a realistic case that wouldn't fit in 200k. It is the test that keeps the "1M context as one bundle" claim honest.

Memory-store size discipline

Memory-store entries have a hard 100 KB cap per file (Managed Agents spec). Under Path C / M13, blacklight uses one memory store per workspace (bl-case) plus the Skills + Files primitives for skill content, see Skills Architecture.

StoreAccessTypical contentsCap discipline
bl-caseread_writehypothesis, evidence pointers, pending steps, applied actions, ledger; path-namespaced per caseper-file ≤ 100 KB; raw evidence offloaded to Files API
bl-skillsread_onlyRETIRED in M13. Skill content moved to the Skills primitive (description-routed) + reference Files (mounted at /skills/<basename>). Older docs that name a bl-skills memstore predate Path C.n/a

Raw evidence bundles (.tgz packed) live in the Files API, not in memory stores. Memory stores carry pointers (evidence/evid-0001.md{source, sha256, summary, file_id}). The curator's read_memory calls return the pointer; it then read_files the file_id to drill in. This keeps memory-store budgets small and re-readable across sessions.