Skip to main content
rfxn
//
testingbashdockercibats

Three Bash Projects, Eight Distros, Ninety Seconds

Ryan MacDonald10 min read

We ship three Bash projects: Advanced Policy Firewall (APF), Brute Force Detection (BFD), and Linux Malware Detect (maldet). Each deploys across a matrix that still includes CentOS 6 (2011-era coreutils in /bin), modern Rocky and Ubuntu, and a long tail of customer boxes running whatever their hosting provider installed half a decade ago. A typo in an awk arg or a stray mapfile -d ships to every one of them.

For years the answer was “run it in a VM and see what breaks.” That scales to one operating system, poorly to three, and not at all to eight. Cloud CI services charge per minute, and our workload is dominated by cold builds that reinstall the whole distro every run. We wanted the full matrix to finish in the time it takes to make a cup of coffee, running on hardware we already own, with no recurring bill.

The stack we landed on is simple: BATS as the test runner, a shared harness called batsman wired into all three projects as a git submodule, and Docker over TCP with mutual TLS pointing at a dedicated build host on the LAN. Full matrix, eight distros, under 110 seconds warm. This post is the tour.

Why BATS#

When the code under test is bash, the tempting answer is a higher-level test runner in Python or Go. We tried it. The integration layer (spawn process, capture stdout, diff strings) grew to outweigh the tests. Worse, it lied about the runtime: a subprocess launched from pytest has a different inherited environment than cron or systemd, and our regressions live exactly in that seam.

BATS (Bash Automated Testing System) is a shell-native test runner. Each test is a bash function. The run builtin captures stdout, stderr, and exit code; assertions hang off status and output. Output is TAP-compatible, so it pipes straight into any CI dashboard that understands TAP.

bash
#!/usr/bin/env bats
load '/usr/local/lib/bats/bats-support/load'
load '/usr/local/lib/bats/bats-assert/load'
source /opt/tests/helpers/assert-scan.bash

SAMPLES_DIR="/opt/tests/samples"
TEST_SCAN_DIR="/tmp/lmd-test-scan"

setup() {
    source /opt/tests/helpers/reset-lmd.sh
    mkdir -p "$TEST_SCAN_DIR"
}

teardown() {
    rm -rf "$TEST_SCAN_DIR"
}

@test "MD5 scan detects known test sample (EICAR)" {
    cp "$SAMPLES_DIR/eicar.com" "$TEST_SCAN_DIR/"
    run maldet -a "$TEST_SCAN_DIR"
    assert_scan_completed
    assert_output --partial "malware hits 1"
}

That is the whole test. It invokes the real maldet binary the way cron would, in the container's actual environment, and asserts on the output an operator would see. No mocks. No fixtures that drift from production behavior. When a test fails, the debugging technique is the one bash engineers already use: bash -x the failing bit and read the trace.

We extend BATS with two conventions. Every project has a helpers/ directory of reusable assert_* functions tailored to the domain (scan completion, quarantine presence, firewall chain state), and every test sources the bats-support and bats-assert libraries pre-installed in the container image. The tests read like prose, and they measure what production actually does.

Why Containers, Not VMs#

The matrix is the hard part. APF, BFD, and maldet each need to work on CentOS 6, CentOS 7, Rocky 8, Rocky 9, Ubuntu 20.04, Ubuntu 24.04, Debian 12, and (project-dependent) a legacy ubuntu1204 or a Rocky 10 preview. Three projects × eight images, rebuilt on every library change, is not a VM workload. It is a container workload.

Containers give us three things a VM farm does not:

Cold start in seconds . A fresh Rocky 9 container is running bash in under a second. A VM takes a minute even with kernel-samepage-merging tricks.
Layer cache . Each Dockerfile installs a deterministic set of packages (coreutils, bash, iptables, ClamAV, bats-core, bats-assert, bats-support). Docker caches that layer. If only the test file changes, the rebuild is a no-op; if only the source changes, it is a COPY. VMs in contrast provision from scratch.
Declarative images . Each Dockerfile.<os> lives in version control. When a new Rocky minor release ships and breaks something, the change is a diff on a twenty-line file, not a ticket to rebuild a VM template.

CentOS 6 is the interesting one. The distro is long past EOL, but our user base still includes hosts that refuse to leave it. Frozen base images are readily available on Docker Hub and in community mirrors, and they still run under modern Docker with vsyscall=emulate on the host kernel. Our CentOS 6 image installs bash 4.1 and coreutils from the frozen Vault repo. When we introduced a bug that used mapfile -d (bash 4.4+), the CentOS 6 container caught it before a user did. That one container justifies the whole stack.

The batsman Submodule#

Three projects, three test harnesses, three copies of the same Dockerfile template and runner script, drifting independently. That was the first cut and it was painful. Every fix had to land three times. Every new distro had to be added three times. Docker flags updated in one project's tests would not propagate to the others for weeks.

The fix is batsman, a shared harness repository. It owns the Docker runner, the Makefile.tests include fragment that projects wire into their own Makefile, and a base set of Dockerfiles per OS target. Each project includes it as a git submodule at tests/infra/.

bash
# In each consumer repo
$ git submodule add https://github.com/rfxn/batsman.git tests/infra
$ ls tests/infra/
  dockerfiles/    include/     lib/     scripts/

# Project-level Makefile — include the shared fragment
$ cat tests/Makefile
BATSMAN_PROJECT   := lmd
BATSMAN_OS_MODERN := debian12 rocky9 ubuntu2404
BATSMAN_OS_LEGACY := centos7 rocky8 ubuntu2004
BATSMAN_OS_DEEP   := centos6
export BATSMAN_TEST_TIMEOUT ?= 180
include infra/include/Makefile.tests

Submodule semantics give us exactly what we need: each consumer pins a specific batsman commit. Upgrades are intentional. When we change the runner in batsman, LMD, APF, and BFD each run git submodule update --remote tests/infra when they are ready to pick up the change, verify the full suite still passes, and commit the updated submodule pointer. No silent drift. No version-unaware consumers.

The BATSMAN_PROJECT variable is the project's identity. The runner uses it to pick the right install script, the right default container path (by convention /opt/tests), and the right container name prefix. Everything else (which OSes to run, per-test timeouts, extra Docker flags) is a Make variable the consumer can override. The harness has defaults; projects override only what they need to.

Consuming projects also own the Dockerfiles that actually install their code. batsman provides base images with bash, coreutils, and bats preinstalled; each project's tests/dockerfiles/Dockerfile.<os> extends the base and adds whatever the project needs (maldet adds ClamAV and YARA, APF adds iptables and ipset). The shared layer caches once. The project-specific layer rebuilds only when project files change.

Docker over TCP with Mutual TLS#

The test host and the dev host are different machines. We develop on freedom (the workstation). We test on anvil (a dedicated build box on the LAN that has no other job). Running the matrix on freedom means the workstation is unresponsive for two minutes on every push. Running it on anvil means shipping the source to anvil and collecting results somehow.

The common answer is SSH: pipe docker commands through ssh or rsync the source and trigger a remote make. Both work. Both add per-command connection overhead, serialize anything that could stream, and fight with the Docker client's assumption that it is talking to a daemon directly. BuildKit streaming in particular falls on its face over SSH pipes.

The better answer is the one Docker was designed for: expose the daemon over TCP with mutual TLS, and point the client at it.

BATS OVER DOCKER TCP TOPOLOGYfreedomdev hostmake -C tests testdocker client~/.docker/tls/DOCKER_HOST=tcp://:2376 · mutual TLSanvilbuild hostdockerd on :2376/etc/docker/tls/DOCKER_BUILDKIT=1layer cachebase + project layersparallel fanoutdocker run per OSbind-mount tests/TAP to stdout→ → →OS CONTAINERScentos6bash + coreutils + batsTAPcentos7bash + coreutils + batsTAProcky8bash + coreutils + batsTAProcky9bash + coreutils + batsTAPdebian12bash + coreutils + batsTAPubuntu2004bash + coreutils + batsTAPubuntu2404bash + coreutils + batsTAPTAP stream → tee /tmp/test-*.log1 clientmTLS boundary1 daemonN parallel containers

anvil's Docker daemon listens on tcp://0.0.0.0:2376. Server certs live at /etc/docker/tls/. freedom has the matching CA plus a client cert at ~/.docker/tls/. Both sides authenticate on every connection; the daemon refuses any client whose cert is not signed by the known CA. Running the matrix on anvil is a one-liner:

bash
# Run the full matrix on anvil, stream results back to /tmp
DOCKER_HOST=tcp://192.168.2.189:2376 \
DOCKER_TLS_VERIFY=1 \
DOCKER_CERT_PATH=~/.docker/tls \
  make -C tests test 2>&1 | tee /tmp/test-lmd-debian12.log | tail -30

# Or use a named context — one-time setup, no env vars after
$ docker context create anvil \
    --docker host=tcp://192.168.2.189:2376,ca=~/.docker/tls/ca.pem,\
cert=~/.docker/tls/cert.pem,key=~/.docker/tls/key.pem
$ docker --context anvil ps
$ make -C tests test   # inherits DOCKER_HOST from context

The important properties of this setup are all practical. The client sees a local-feeling Docker daemon; BuildKit streams work natively; parallel docker run calls for different OS containers actually run in parallel on the daemon side; there is no SSH connect handshake on every command; and there is no copy of the source code on anvil at rest. Source gets bind-mounted into each container for the duration of its run and disappears with the container.

The mutual TLS part is not fancy. We generate a CA, sign a server cert for anvil and a client cert for freedom with openssl, valid ten years, lock down the private key directories with chmod 700, and done. This is not a substitute for a real zero-trust stack. It is a pragmatic one-admin, two-host setup that keeps the daemon off the public internet and gates access on possession of a specific client cert.

The BuildKit Gotcha#

Callout

Without DOCKER_BUILDKIT=1 in anvil's shell environment, the daemon falls back to the legacy builder. The legacy builder ignores Docker 29's content-addressable layer cache for remote build contexts. Every build rebuilds every layer. A warm matrix run that should take ninety seconds takes fifteen minutes instead.

We hit this three times before we believed it. The symptom is boring: anvil feels slow for no reason. The fix is one line in anvil's ~/.bashrc:

bash
# On anvil (the build host), exported for all shells
# Docker's legacy builder ignores layer cache for TCP-delivered build
# contexts; BuildKit is required for the cache to work at all.
export DOCKER_BUILDKIT=1

We also added a check in the test runner: if a build takes longer than five minutes on what should be a warm cache, it prints a reminder to check echo $DOCKER_BUILDKIT on the daemon host. This class of config drift is invisible from the client side; the only signature is latency, and latency is easy to attribute to network.

Timings#

The numbers we actually hit, measured across recent runs:

ConfigurationTargetWarmCold
local (freedom)single OS (Debian 12)~70s+60s image build
anvil over TCPsingle OS (Debian 12)~45s+30s image build
anvil over TCPfull matrix (8 OSes)~110s+4-6 min image builds
anvil, no BuildKitfull matrix (8 OSes)~15 min~15 min

The anvil-single-OS number is faster than the local-single-OS number because anvil is a dedicated box with no other workload; freedom is a developer workstation with browsers, editors, and this website open. The anvil full-matrix number is the one that matters: eight distros, three projects on rotation, warm-cache run completes before you switch windows.

Local runs are still useful. When iterating on a single .bats file, targeting Debian 12 locally means no network at all; feedback is instantaneous, and if the test passes on the most forgiving OS in the matrix, the next sensible step is to kick off the full anvil run and go read Hacker News while it finishes.

Fallback Discipline#

anvil is a single point of failure. When the LAN is flaky or anvil is being rebooted for kernel updates, the fallback is to point the same Docker client at freedom's own daemon. freedom also listens on TCP with the same mutual TLS setup, on 127.0.0.1:2376:

bash
# Fallback: run on freedom's local Docker daemon
DOCKER_HOST=tcp://127.0.0.1:2376 \
DOCKER_TLS_VERIFY=1 \
DOCKER_CERT_PATH=~/.docker/tls \
  make -C tests test 2>&1 | tee /tmp/test-lmd-debian12.log | tail -30

# Or via a named context
$ docker --context freedom-tcp ps
$ DOCKER_CONTEXT=freedom-tcp make -C tests test

The fallback is also the right choice for a different class of test: ones that require bind-mounted data that only exists on freedom. A production snapshot of signature files, a captured pcap for a specific customer issue, a sample corpus that is too large to sync to anvil. Those tests get tagged as “freedom-only” in the project's Makefile and skipped on anvil runs.

There is a discipline to this. Every test we write is implicitly tagged by where its data lives. If the test needs data that only freedom has, it runs on freedom. If the test needs only what is in the git tree, it runs on anvil. The samples/ directory under each project's tests/ tree is where committed sample files live; anything outside of that tree is freedom-only by definition.

What We Test, What We Skip#

Honest self-assessment: container-based testing covers runtime behavior. It does not cover every install-time path.

What the containers catch reliably:

Shell portability: bash version skew, coreutils differences across pre-usr-merge and post-usr-merge distros, absent binaries on minimal images.
Regression coverage: every bug we fix gets a regression @test in the nearest BATS file. The test ships in the harness, runs on every commit, and fails loudly if the bug returns.
CLI surface: every argument a user can pass, every config variable a user can set, every exit code we document.

What the containers do not catch, and what still requires live-host validation:

Cron delivery: containers are short-lived and cron is not running inside them by default. We test the scripts cron.daily invokes, not cron itself firing them.
Init integration: systemd units, SysV init scripts, and the wrappers around them get exercised in the container only if the image includes the init system. Most minimal images do not.
Install-time symlink creation: the installer's install.sh laying down /usr/local/sbin/ symlinks gets tested in the container, but behavior under a pre-existing package manager install (RPM or DEB upgrade path) is only covered on a live host.

We are fine with that gap. The container suite catches every bug class that lint misses and runs fast enough to be a pre-commit hook. Install-path validation runs on a small set of live VMs on the release path, not on every commit.

Conclusion#

The bill for this CI stack is: one LAN-resident build host we already owned, one-time effort to write ten Dockerfiles and a shared Makefile fragment, and an afternoon to generate TLS certs. There is no monthly invoice, no per-minute billing, no queue depth on a shared runner pool, and no outages when a provider has a bad Tuesday.

The payoff is that every commit to APF, BFD, or maldet gets exercised against the same distro matrix our users actually run. A bash quirk that only bites on CentOS 6 fails the test that runs on CentOS 6. A mapfile -d introduced in a shared library fails the moment it hits the legacy container. That kind of feedback loop is the difference between shipping confidently and shipping cautiously.

For the companion piece on why the distro matrix matters in the first place (the coreutils-location split between pre- and post-usr-merge distributions, the command prefix discipline, and the other portability landmines we have stepped on) see our portable bash across the pre-usr-merge boundary article.

APF, BFD, and maldet are all open source under GPLv2. batsman itself is unreleased at time of writing; we plan to open source it once the API stabilizes. If you are shipping bash to a distro matrix, the pattern is easy to reproduce from scratch: pick BATS, template a Dockerfile per OS, enable Docker over TCP with mutual TLS on a spare box, set DOCKER_BUILDKIT=1, and never fight a VM farm again.