maldetapfbfdcompliancebash

Structured Audit Logging for Bash Applications

Ryan MacDonaldApril 2, 20265 min read

Most bash-based tools treat logging as an afterthought. They append lines to a text file, maybe with a timestamp and a severity word. When an auditor asks “show me every quarantine event from the last 90 days” or a SOC analyst needs to correlate firewall blocks with scan hits in Splunk, that unstructured text becomes a liability. You end up writing fragile regex parsers that break every time the log format drifts.

We faced this across three projects: Linux Malware Detect (maldet) logs scan events, quarantine actions, alert deliveries, and signature updates. Advanced Policy Firewall (APF) logs trust mutations, IP blocks, escalations, and rule loads. Brute Force Detection (BFD) logs authentication attacks, blocking actions, and threshold triggers. All three needed structured, machine-parseable event logging that could feed directly into enterprise SIEM platforms without middleware. And all three needed it implemented in bash, with no compiled dependencies, running on the same constrained systems where the tools themselves deploy.

The result is elog_lib.sh, a shared structured event logging library consumed by all three projects. One function call emits an event to six output formats simultaneously: classic text, JSONL, CEF, syslog UDP, GELF, and Elasticsearch ECS. This post walks through the architecture.

Event Bus Architecture#

The library exposes two APIs. The first, elog(), handles traditional application logging with severity levels. The second, elog_event(), is the structured event bus. It accepts a typed event with severity, message, and arbitrary key-value context:

bash

# maldet: webshell detected during scheduled scan
elog_event "threat_detected" "warn" \
  "hit {HEX}php.backdoor.b374k.unenc on /home/acct/public_html/wp-content/uploads/2026/03/.cache.php" \
  "file=/home/acct/public_html/wp-content/uploads/2026/03/.cache.php" \
  "sig={HEX}php.backdoor.b374k.unenc" "stage=hex" "scanid=032617-1422.8190"

# maldet: quarantine the hit
elog_event "quarantine_added" "warn" \
  "quarantined /home/acct/public_html/wp-content/uploads/2026/03/.cache.php" \
  "file=/home/acct/public_html/wp-content/uploads/2026/03/.cache.php" \
  "sig={HEX}php.backdoor.b374k.unenc" "owner=acct" "size=48271"

# bfd: SSH brute force blocked
elog_event "block_added" "warn" \
  "blocking 203.0.113.47 (22/tcp: 15 failures in 300s)" \
  "addr=203.0.113.47" "port=22" "proto=tcp" "failures=15" "window=300"

# apf: firewall trust rule added
elog_event "trust_added" "info" \
  "added allow all to/from 10.20.30.5" \
  "action=allow" "host=10.20.30.5" "rule=in:all:all:10.20.30.5"

Every call to elog_event() stages the event into internal globals, then dispatches it to all enabled output modules in sequence. Each module formats the event according to its own spec. SIEM formats (CEF, GELF, ELK) are only pre-formatted when their respective module is enabled, so there is no overhead for unused outputs.

Event Taxonomy#

Unstructured logs fail compliance because there is no contract between what the application emits and what the SIEM expects. A message that says “blocked 10.0.0.5” today might say “deny 10.0.0.5” tomorrow, and every downstream parser breaks. The library defines 23 canonical event types across 7 categories, giving SIEM rules a stable surface to match against:

Category	Event Types	Severity
Detection	threat_detected, threshold_exceeded, pattern_matched, scan_started, scan_completed	warn / info
Enforcement	block_added, block_removed, block_escalated, quarantine_added, quarantine_removed	warn / error
Trust	trust_added, trust_removed	info
Network	rule_loaded, rule_removed, service_state	info
Alert	alert_sent, alert_failed	info / error
Monitor	monitor_started, monitor_stopped	info
System	config_loaded, config_error, file_cleaned, error_occurred	info / error

Each event carries mandatory fields ( ts, host, app, pid, type, level, msg ) plus arbitrary key-value context passed as extra arguments. The type string is the stable contract. SIEM correlation rules match on type=block_escalated, not on a regex against the message text.

Six Output Formats#

A single elog_event() call can emit the same event in up to six formats. Each is purpose-built for a different consumer:

Classic Text

Human-readable, grep-friendly. The format operators and sysadmins already know:

text

Mar 26 14:23:45 myhost maldet(12345): [threat_detected] {scan} 3 malware hits recorded

JSONL

One JSON object per line. Every field is typed, every value is escaped. This is the format the audit log always uses, regardless of what other outputs are enabled:

bash

{"ts":"2026-03-26T14:23:45+00:00","host":"myhost","app":"maldet",
 "pid":12345,"type":"threat_detected","level":"warn",
 "msg":"3 malware hits recorded","count":"3","stage":"hex"}

CEF (Common Event Format)

The lingua franca for ArcSight, QRadar, and Splunk CIM. Pipe-delimited header with key-value extensions:

text

CEF:0|R-fx Networks|maldet|2.0.1|threat_detected|3 malware hits recorded|5|count=3 stage=hex

Syslog UDP

RFC 5424 and RFC 3164 (legacy BSD) formats. Fire-and-forget delivery via bash /dev/udp with nc fallback. The payload can carry classic, JSON, or CEF content inside the syslog envelope.

GELF

Graylog Extended Log Format 1.1. Supports both UDP and HTTP transport. Custom fields are auto-prefixed with _ per the GELF spec, timestamps are Unix epoch seconds, and messages over 256 characters split into short_message and full_message.

ELK / ECS

Elasticsearch-native JSON aligned to the Elastic Common Schema. Event types map automatically to ECS categories and types:

bash

{"@timestamp":"2026-03-26T14:23:45+00:00",
 "log.level":"warn",
 "message":"3 malware hits recorded",
 "event.kind":"event",
 "event.category":"intrusion_detection",
 "event.type":"denied",
 "event.action":"threat_detected",
 "host.name":"myhost",
 "process.name":"maldet",
 "process.pid":12345,
 "labels":{"count":"3","stage":"hex"}}

The ECS mapping is built into the library. Detection events map to intrusion_detection, trust mutations to configuration, network rules to network. No Logstash pipeline config needed.

Zero-Code SIEM Integration#

Every output module is enabled and configured through environment variables. There is no config file syntax to learn, no plugin to install, no agent to deploy. Point the variables at your infrastructure and events start flowing:

bash

# Splunk via CEF
ELOG_CEF_FILE="/var/log/maldet/cef.log"

# Graylog via GELF over UDP
ELOG_GELF_HOST="graylog.internal"
ELOG_GELF_PORT="12201"
ELOG_GELF_TRANSPORT="udp"

# Elasticsearch direct ingest
ELOG_ELK_URL="https://elk.internal:9200"
ELOG_ELK_INDEX="security-events"

# Syslog to central collector
ELOG_SYSLOG_UDP_HOST="syslog.internal"
ELOG_SYSLOG_UDP_PORT="514"
ELOG_SYSLOG_UDP_FORMAT="5424"
ELOG_SYSLOG_UDP_PAYLOAD="json"

Transport is non-blocking. UDP delivery uses bash /dev/udp with a nc fallback. HTTP delivery (GELF, ELK) uses curl with a wget fallback. Both fire in background subshells so the calling application never blocks on network I/O. A 3-second connect timeout and 5-second max timeout prevent hung connections from stalling scans or firewall operations.

The Audit Trail#

Regardless of which output modules are enabled, every elog_event() call writes to the JSONL audit log. This file is not subject to log truncation or rotation by the library. It is the tamper-evident record of every security-relevant action the application took.

For maldet, that means every scan start and completion, every malware hit, every quarantine and restore, every alert delivery (and failure), every signature update. For APF, every trust addition and removal, every IP block and escalation, every firewall rule load. For BFD, every brute force detection and block action. When an auditor needs to reconstruct what happened and when, the audit log provides a single, machine-parseable source of truth.

bash

# Last 10 quarantined files with their signatures:
grep '"type":"quarantine_added"' /var/log/maldet/audit.log | \
  jq -r '[.ts, .file, .sig] | @tsv' | tail -10

# All SSH brute force blocks in the last 24 hours:
grep '"type":"block_added"' /var/log/bfd/audit.log | \
  jq -r 'select(.ts > "2026-03-25") | [.ts, .addr, .port, .failures] | @tsv'

# APF trust changes by action:
grep '"type":"trust_' /var/log/apf/audit.log | \
  jq -r '[.type, .host, .action] | @tsv' | sort | uniq -c | sort -rn

# Cross-project event counts:
for log in /var/log/{maldet,apf,bfd}/audit.log; do
  echo "=== ${log} ==="
  jq -r '.type' "$log" | sort | uniq -c | sort -rn
done

Conclusion#

Structured event logging in bash is not a contradiction. The tools that generate the events (grep, awk, bash builtins) are the same tools that format the output. What makes it work is the contract: 23 typed events with stable names, mandatory fields with known semantics, and format modules that translate that contract into whatever the receiving system expects.

The library is shared across Linux Malware Detect, Advanced Policy Firewall, and Brute Force Detection, all open source under GPLv2. The 2.x branches of each project are in active development and expected to release in the coming weeks.

Back to research