bashportabilityapfbfdmaldet

Portable Bash for 20 Years of Unix Fragmentation

Ryan MacDonaldMay 5, 20269 min read

The Shell You Thought You Knew#

A one-line install script that worked on every server in your test lab can still fail silently on a customer's CentOS 6 host because /usr/bin/cp does not exist there. That is the kind of bug portable shell is about: not syntax, but the quiet filesystem and runtime assumptions that have drifted over two decades of Linux distribution history.

rfxn ships three Bash projects onto an OS matrix that starts at CentOS 6 (2011) and ends at whatever landed last month. Linux Malware Detect (maldet), Advanced Policy Firewall (APF), and Brute Force Detection (BFD) run on servers that have been online for years, deploy into hardened compliance environments where yum install python3 is not an option, and land in containers where coreutils may live anywhere at all. This article is the field-tested pitfall list and the conventions we settled on after hitting them in production.

Portability is a one-time engineering tax with recurring benefit. Every rule we learned on CentOS 6 kept us out of trouble somewhere else.

Rules at a Glance#

The short version, for readers who want the conventions without the history:

Coreutils

command cp

Let PATH resolve. Never hardcode /bin or /usr/bin.

Admin binaries

command -v iptables

Discover once at startup, cache the absolute path.

Bash floor

4.1.2 (CentOS 6)

No ${var,,}, no mapfile -d, no declare -n, no $EPOCHSECONDS.

Global state

no declare -A

Sourced-from-function creates a local. Use parallel indexed arrays.

Directory change

cd "$dir" || return 1

Silent failure continues in wrong CWD. set -e is not a substitute.

Init detection

systemd / sysv / upstart / rc.local

Probe /run/systemd/system first, /proc/1/comm second, init.d last.

The usr-merge Cliff#

For most of Unix history, /bin held the essential tools needed to bring a system up to single-user mode, and /usr/bin held everything else. Fedora 17 (2012) ended that split by symlinking /bin → /usr/bin. RHEL/CentOS 7 adopted the merge. Debian 12 (bookworm, 2023) finished its transition; see the Debian UsrMerge wiki for the gory details. Arch, openSUSE, and modern Ubuntu are all merged.

But not everything merged. CentOS 6 and Ubuntu 12.04 never did, and those hosts are still in the field under extended support and locked compliance regimes. On them, /bin/cp is a real file and /usr/bin/cp does not exist.

bash

# In an install script
/usr/bin/cp -f "$src" "$dest"
# Works on Rocky 9, Debian 12, Ubuntu 22.04
# Fails on CentOS 6: "No such file or directory"

# The obvious "fix"
/bin/cp -f "$src" "$dest"
# Works on CentOS 6
# Fails on minimal containers, NixOS, some Alpine layouts
# Fails on FreeBSD where cp lives in /bin but with different flags

The rfxn convention, enforced across the codebase, is to let PATH do its job:

bash

# From files/internals/lmd_clamav.sh in maldet
command rm -f "$cpath"/rfxn.{hdb,ndb,yara,hsb} 2>/dev/null  # safe: ClamAV path may not have LMD sigs
command cp -f "$inspath/sigs/rfxn.ndb" "$inspath/sigs/rfxn.hdb" \
    "$inspath/sigs/rfxn.yara" "$cpath/" 2>/dev/null  # safe: ClamAV path may not exist

The command builtin does two things: it bypasses shell functions and aliases (so a buggy cp() function cannot intercept an install step), and it resolves the binary through PATH. On CentOS 6 that resolves to /bin/cp. On Rocky 9, /usr/bin/cp. Nothing downstream cares which.

We extended this rule to every coreutil used in project source: command cp, command mv, command rm, command chmod, command mkdir, command cat, command touch, command ln. Exceptions: printf and echo are Bash builtins; prefixing them with command forces the slower external binary for no gain. Those stay bare.

One anti-pattern we explicitly ban is the backslash bypass:

bash

# DO NOT DO THIS
\cp -f "$src" "$dest"      # bypasses aliases but not functions
\rm -rf "$old"             # still hits any rm() function defined higher up
\mv "$a" "$b"              # not portable across shells (ksh, dash)

# This is the rule:
command cp -f "$src" "$dest"
command rm -rf "$old"
command mv "$a" "$b"

The Runtime Matrix#

Here is the matrix we test against, with the four facts that dominate shell-level portability: where coreutils live, which init system runs PID 1, what Bash version ships, and what TLS floor the default OpenSSL build honours.

Every red cell has produced a production bug at some point in our project history. The rules below are the ones that survived. The FreeBSD Handbook is worth a read for anyone who has only lived in Linux: coreutils flags diverge in ways that catch you off-guard.

The /sbin Split#

APF is a firewall. Its job is invoking iptables, ip6tables, ipset, ip, and other admin binaries. These live in /sbin or /usr/sbin depending on the distro and whether the usr-merge happened. Worse, they are often not on the PATH of a non-login shell. Cron, for example, defaults to a bare PATH=/usr/bin:/bin on some distros.

The APF approach, from files/internals/internals.conf, is to discover once at startup and cache the absolute path:

bash

# APF: files/internals/internals.conf (discovery)
ifconfig=$(command -v ifconfig 2>/dev/null)
ip=$(command -v ip 2>/dev/null)
IPT=$(command -v iptables 2>/dev/null)
IP6T=$(command -v ip6tables 2>/dev/null)
IPTS=$(command -v iptables-save 2>/dev/null)
IPTR=$(command -v iptables-restore 2>/dev/null)
IP6TS=$(command -v ip6tables-save 2>/dev/null)
IP6TR=$(command -v ip6tables-restore 2>/dev/null)
IPSET=$(command -v ipset 2>/dev/null)

APF extends PATH at the top of the main script (PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin), so command -v finds every binary wherever it lives. The cached paths are then used throughout the runtime.

Fallback chains matter too. APF prefers ip (iproute2) but falls back to ifconfig and route (net-tools, deprecated but still present on many hosts). Every probe uses command -v. Never which (separate binary, not installed on minimal systems) and never type (shell-specific output quirks). This is BashFAQ 81 territory, and the POSIX shell spec (command) agrees.

The Bash 4.1 Floor#

CentOS 6 ships Bash 4.1.2. It is the single most restrictive target in our matrix, because everything above it has accreted handy features that do not work there. Some fail at parse time; others silently misbehave at runtime. The following are banned in rfxn project source (see the GNU Bash manual for the authoritative version history):

Feature	Requires	Portable Alternative
${var,,} / ${var^^}	Bash 4.2+	echo "$var" \| tr '[:upper:]' '[:lower:]'
mapfile -d	Bash 4.4+	while IFS= read -rd ''
declare -n (nameref)	Bash 4.3+	indirect: ${!varname}
$EPOCHSECONDS	Bash 5.0+	date +%s (costs one fork)
$EPOCHREALTIME	Bash 5.0+	date +%s.%N (GNU date only)
declare -A (global scope)	trap	parallel indexed arrays

The declare -A Trap

That last row deserves its own section. Bash supports associative arrays via declare -A starting in 4.0. The problem: when a library is sourced from inside a function (exactly what the BATS test harness does via load), a bare declare -A foo creates a function-scoped local. The name vanishes on return, every later access looks at an empty array, and the code silently behaves wrong.

bash

# BROKEN: declare -A at top level of a library sourced from inside a function
declare -A _sigmap_cache   # becomes a local, vanishes on return

# BROKEN ALTERNATIVE: "fix" with -g
declare -gA _sigmap_cache  # works in Bash 4.2+, not 4.1; breaks CentOS 6

# PORTABLE: parallel indexed arrays
_cache_keys=()
_cache_vals=()
_cache_set() {
    local key="$1" val="$2" i
    for i in "${!_cache_keys[@]}"; do
        if [ "${_cache_keys[i]}" = "$key" ]; then
            _cache_vals[i]="$val"
            return 0
        fi
    done
    _cache_keys+=("$key")
    _cache_vals+=("$val")
}

# PORTABLE + INSIDE A FUNCTION: local -A is fine
_rule_eval() {
    local -A _loaded_sids=()   # properly scoped, does not leak
    # ... use _loaded_sids freely ...
}

Nuance: inside a function, local -A works cleanly on every version of Bash we target (including 4.1), because local scope is exactly what you want inside a function. The ban applies to global associative arrays.

Lint passes are necessary but never sufficient. Only runtime on the oldest target tells you the truth.

Other Bash Gotchas

A few more patterns that came out of real CentOS 6 and BATS bugs. Greg's Bash Pitfalls is the canonical long-form reference; these are the four we hit most often in production:

local var=$(cmd) masks the exit code. The local builtin always returns 0, so set -e never trips. Declare first, then assign.

args="$@" collapses with IFS. Use args=("$@") for array semantics or args="$*" for a joined string.

cd $dir without a guard is a latent bug. If the directory is missing, execution continues in the wrong CWD. Always cd "$dir" || return 1. set -e is not a substitute because a failing cd inside a pipeline or conditional does not trip it.

Background subshells inside $() callers can hang. If a caller runs out=$(func) and func launches a background subshell, inherited pipe fds keep the caller waiting forever. Use ( exec >/dev/null 2>&1; cmd ) & so the background subshell replaces its own fds.

Systemd, SysV, Upstart#

Installing a daemon sounds simple. In practice, four init systems are alive in our matrix: systemd (default everywhere modern), sysvinit (CentOS 6, Slackware, minimal containers), Upstart (Ubuntu 14.04), and rc.local-style systems where we are on our own.

APF and BFD share a detection library (pkg_lib.sh) that sets _PKG_INIT_SYSTEM to one of systemd | sysv | upstart | rc.local | unknown via a cascade of probes:

bash

# From APF/BFD pkg_lib.sh: pkg_detect_init()
_PKG_INIT_SYSTEM="unknown"

# Primary: systemd runtime directory
if [[ -d /run/systemd/system ]]; then
    _PKG_INIT_SYSTEM="systemd"
    return 0
fi

# Secondary: PID 1 process name
# Guarded because /proc/1/comm may not exist on CentOS 6
if [[ -f /proc/1/comm ]]; then
    local pid1_comm
    pid1_comm=$(cat /proc/1/comm 2>/dev/null) || pid1_comm=""
    case "$pid1_comm" in
        systemd)  _PKG_INIT_SYSTEM="systemd" ;;
        init)     _PKG_INIT_SYSTEM="sysv" ;;
        upstart)  _PKG_INIT_SYSTEM="upstart" ;;
    esac
fi

# Tertiary: init.d directories exist but no systemd
if [[ -d /etc/init.d ]] || [[ -d /etc/rc.d/init.d ]]; then
    _PKG_INIT_SYSTEM="sysv"
    return 0
fi

# Last resort: rc.local
if [[ -f /etc/rc.local ]] || [[ -f /etc/rc.d/rc.local ]]; then
    _PKG_INIT_SYSTEM="rc.local"
fi

Probe order matters. /run/systemd/system is cheap and authoritative where it exists. The init.d fallback runs last because distros like Rocky 9 still ship /etc/init.d for compatibility even when systemd runs PID 1.

BFD ships both a systemd unit and a SysV script; the installer picks one based on the detected init:

bash

# BFD install.sh (simplified)
if [ "$_PKG_INIT_SYSTEM" = "systemd" ]; then
    _unit_dir=$(_pkg_systemd_unit_dir)   # /lib/systemd/system or /usr/lib/systemd/system
    command cp bfd.service "$_unit_dir/"
    command cp bfd.timer "$_unit_dir/"
    systemctl daemon-reload
    systemctl enable bfd.timer
else
    # sysv / upstart / rc.local all get the init script
    for _idir in /etc/rc.d/init.d /etc/init.d; do
        [ -d "$_idir" ] && command cp bfd-watch.init "$_idir/bfd-watch" && break
    done
    if command -v chkconfig >/dev/null 2>&1; then
        chkconfig bfd-watch on 2>/dev/null || true  # chkconfig may not support this service
    fi
fi

Two details that cost us debug cycles: the systemd unit directory itself is not stable (Debian puts it in /lib/systemd/system, RHEL in /usr/lib/systemd/system, both appear as symlinks of each other on merged systems), and chkconfig is not installed by default on some minimal images even when the system is SysV-based. Both get probed, both get fallbacks.

TLS on Legacy#

maldet pulls signature updates from rfxn infrastructure over HTTPS. So do the APF reputation feeds. On a modern system that is unremarkable: curl against an LE-issued cert with TLS 1.3 and SNI works out of the box. On CentOS 6, almost every part of that sentence is wrong.

CentOS 6 shipped OpenSSL 1.0.1 with backports that eventually included TLS 1.2, but on some historical minor versions the backport was incomplete or missing for certain ciphers. Its bundled curl predates broad SNI support, so virtual hosts on modern edge infrastructure sometimes fail the handshake. The system CA bundle has not been updated in over a decade on unpatched hosts.

The rfxn mitigations are layered:

Signature integrity is independent of transport. Bundles are fingerprinted with SHA-256 against a known-good list. HTTP fallback is acceptable only because the content is hash-pinned; an attacker in the middle cannot swap the bundle without breaking the fingerprint.

curl and wget both probed. One install may have one, another the reverse; minimal containers sometimes have neither. Detection is command -v curl first, then command -v wget, with explicit error if neither is present.

Explicit timeouts, always. A hung TLS handshake against a misconfigured edge can stall a signature window. Every curl gets --connect-timeout and --max-time; every wget gets --timeout.

The goal is not to support TLS 1.0 as a security posture. It is to keep signature updates flowing on hosts that have not rebooted in five years, so maldet can still detect new webshells on them. Security depends more on whether the bundle arrives than on what cipher carried it. The CentOS project itself wound down in 2024; the tail is genuinely shrinking, just not as fast as anyone hoped.

Testing the Matrix#

None of the rules above matter if they are not enforced. The only way to know a change survives CentOS 6 is to run it on CentOS 6. All three rfxn projects share a BATS-based harness (batsman) that runs the full suite inside Docker containers for each target distro.

Every code-changing commit runs against Debian 12 and Rocky 9 at minimum. Major changes run the full matrix: centos6, centos7, rocky8, rocky9, ubuntu20, ubuntu22, ubuntu24, debian12. The companion article on our Docker-over-TCP BATS harness documents how the matrix is driven from one Makefile target and how a dedicated test host parallelises the runs. For the shared-library engineering that underpins a lot of this code, see our writeup on structured audit logging in bash, which covers the same discipline applied to event emission.

One detail the tests themselves honour: test files (.bats) run inside Docker containers with no aliases and a pre-merge layout on some images. Inside tests, we use bare cp, rm, mv and let the container's PATH resolve them. Inside project source, we use command cp and friends. Three contexts, three rules. All encoded in governance, all enforced by grep.

The Verification Gauntlet#

Every commit that touches shell files runs a pre-commit gauntlet of syntax checks and pattern greps, because the alternative is catching them on a customer's CentOS 6 host. The patterns, one-line rationale each:

bash

# bash -n: parse-time syntax check (free; catches half the accidental typos)
bash -n <all-shell-files>

# shellcheck: SC-series lints, covers the other half
shellcheck <all-shell-files>

# which: separate binary, not installed on minimal systems; use command -v
grep -rn '\bwhich\b' files/

# egrep: deprecated by GNU since 2007; use grep -E
grep -rn '\begrep\b' files/

# backticks: no nesting, hostile escaping; use $()
grep -rn '`' files/

# suppression without justification: every hit needs an inline comment on the SAME line
grep -rn '|| true' files/
grep -rn '2>/dev/null' files/

# bare coreutils (portability violation): use command prefix
grep -rn '^\s*cp \|^\s*mv \|^\s*rm ' files/
grep -rn '^\s*chmod \|^\s*mkdir \|^\s*touch \|^\s*ln ' files/
grep -Prn '^\s*cat\s(?!<<)' files/

# word-boundary sweep: catches mid-line, inside $(), after ; or | (anchored greps miss these)
grep -rn '\bcat\b' files/ | grep -v 'command cat' | grep -v 'cat <<'
grep -rn '\bchmod\b\|\bmkdir\b\|\btouch\b\|\bln\b' files/ | grep -v 'command '

# hardcoded coreutils path: breaks on the non-merged side
grep -rn '/usr/bin/\(rm\|mv\|cp\|chmod\|mkdir\|cat\|touch\|ln\)' files/

# backslash alias bypass: prohibited; use command prefix
grep -rn '\\cp \|\\mv \|\\rm ' files/

# local var=$(...) always returns 0 and masks the subshell exit code
grep -rn 'local [a-z_]*=\$([^(]' files/

# every cd must carry || exit / || return guard
grep -rn '^\s*cd ' files/

Every hit requires either a fix or an inline justification on the same line. Section-level comments on preceding lines do not satisfy the rule, because git blame and drive-by edits both lose that context. ShellCheck catches most of the syntactic hazards before the greps even run; the greps cover the rules shellcheck does not encode.

Project-specific extensions layer on top. APF greps for hardcoded iptables paths. BFD greps for bare systemctl calls that bypass init detection. maldet greps for hardcoded signature URLs. Each project's CLAUDE.md extends the base list above with its own patterns.

Conclusion#

Legacy does not matter forever. CentOS 6 is out of ELS; every year the long tail gets shorter. The point is that portability is a one-time engineering tax with recurring benefit. Every rule we learned on CentOS 6 has kept us out of trouble somewhere else: on Alpine containers where coreutils are BusyBox, on NixOS where nothing lives at either /bin or /usr/bin the expected way, on FreeBSD where flags diverge, on Gentoo where users build their own layouts. A codebase that follows the command and command -v discipline ports to any of those without a patch.

It also passes review faster. Writing command cp instead of cp is eight keystrokes. Debugging a silent install failure on a customer's five-year-old host is a week and a lost support ticket.

maldet, APF, and BFD are all open source under GPLv2. The shared pkg_lib.sh, the governance rules that drive the grep patterns, and the BATS matrix all live in the project repositories. If any of it is useful to your own portable-shell codebase, take it.

References#

The Shell You Thought You Knew#

Rules at a Glance#

The usr-merge Cliff#

The Runtime Matrix#

The /sbin Split#

The Bash 4.1 Floor#

The declare -A Trap

Other Bash Gotchas

Systemd, SysV, Upstart#

TLS on Legacy#

Testing the Matrix#

The Verification Gauntlet#

Conclusion#

References#

Bash Reference

Distro & Portability

Related rfxn Research