Skip to main content
rfxn
//
kernelcvevulnerabilityaf-algld-preloadrhel

Closing the AF_ALG Window: Userspace Mitigation for CVE-2026-31431 ("Copy Fail")

Ryan MacDonald13 min read

On 2026-04-29 the copy.fail disclosure landed: CVE-2026-31431, an algif_aead AEAD scratch-write bug in the Linux kernel that gives an unprivileged tenant a 4-byte page-cache write to any readable file at attacker-chosen offset. By the end of the same day there were two working public PoCs: the Theori writeup overwriting /usr/bin/su with a 160-byte planted ELF, and a rootsecdev variant flipping the running user's UID in /etc/passwd from 1234 to 0000. On a multi-tenant hosting node, the threat model collapses: any shell user can become root, and the on-disk inode is byte-for-byte unchanged through the entire chain.

We operate hosting nodes at scale, and we needed a defense that worked on day zero, on running kernels, without a reboot, on tiers where the vendor patch had not landed yet. What follows is the work we did to absorb it: a mechanical breakdown of the primitive, an honest accounting of the five defense rungs and where each one fails, the LD_PRELOAD shim and the read-only posture auditor we packaged into signed RPMs, and the architectural conclusion we are standardizing on independent of this single CVE. The published PoC is intentionally not linked here; the upstream advisory and Theori writeup cover the chain and we are not reproducing it.

Source· GPL v2· Issues open

github.com/rfxn/copyfail

Single-file C shim, the auditor source, the RPM spec driving the signed builds, and the issue tracker. Patches and posture data welcome.

View on GitHub
mainv1.0.1
TL;DRCritical · ActiveCVE-2026-31431 · Linux kernel
The bug
Linux kernel algif_aead AEAD scratch-write bug. Unprivileged tenant binds authencesn(hmac(sha256),cbc(aes)), drives a 4-byte write into the page cache of any readable file at attacker-chosen offset. On-disk inode is unchanged. Outcome is local privesc to root with no on-disk forensic artifact.
Patched
Upstream commit a664bf3d603d, a revert of the 2017 in-place AEAD optimization. RHEL/Alma/ Rocky 7-10 ship CONFIG_CRYPTO_USER_API_AEAD=y, so modules are builtin, not loadable. Kpatch lag is days to weeks; EL7 has no patch path at all.
Forward
Standardize on no-AF_ALG hosts. LD_PRELOAD shim system-wide via /etc/ld.so.preload, systemd RestrictAddressFamilies=~AF_ALG drop-ins on the load-bearing units, auditd rule on socket(38). Hosting workloads do not legitimately use AF_ALG.

The Primitive#

The primitive is two design decisions composing in a way nobody drew on a whiteboard. Read them in order:

AF_ALG: a userspace door into the kernel crypto API

AF_ALG (socket domain 38) is the userspace interface into the Linux kernel's crypto API. It exists so unprivileged code can ask the kernel to do AEAD, hashing, and skcipher work without shipping a custom OpenSSL build. The interface is unprivileged by design: no capabilities required, no special device file to open, just socket(AF_ALG, SOCK_SEQPACKET, 0) and a bind() naming the algorithm. In 2017 the AEAD path picked up an in-place optimization: rather than allocate a fresh output buffer, the decrypt could write its scratch directly back into the source pages. Years later, this is the bug.

The asymmetry: tag check fails, scratch write fires anyway

On the authencesn(hmac(sha256),cbc(aes)) path, the kernel performs the in-place AEAD decrypt before completing the authentication tag check. When the tag check fails (which it will, because the attacker is supplying garbage ciphertext with no key), the syscall returns EBADMSG and userspace gets an error. By that point the 4-byte scratch write has already happened. With splice(2) driving the source side, the destination is the page cache of a file the attacker has read access to, at an offset of the attacker's choosing, with bytes the attacker controls.

The on-disk inode is byte-for-byte unchanged. The corruption lives only in the page cache. The kernel parses the corrupted page on every subsequent read, including execve() loading a setuid binary, or getpwnam() parsing /etc/passwd. AIDE/Tripwire reading those files via standard POSIX I/O sees the corruption while it is cached, but the attacker can call posix_fadvise(POSIX_FADV_DONTNEED) to evict the page after the privesc, or memory pressure does it eventually. Forensic-traceless.

Why hosting nodes are the worst-case threat model

The primitive is “4-byte page-cache write to any readable file.” The interesting target list on a multi-tenant host is long: setuid binaries (Theori's /usr/bin/su ELF plant), privilege-relevant config (/etc/passwd UID flip, /etc/sudoers, /etc/security/access.conf, PAM configs), systemd unit files, file capabilities, the cPanel /InterWorx custom suid helpers in /usr/local/cpanel/ and /scripts/. Every one of those is consulted by privileged code at some point in the boot or login chain. Multi-tenant shared hosting is the worst-case threat model for this CVE because tenants by definition have shell, and the primitive needs nothing more than an unprivileged process.

Reachability on RHEL/Alma/Rocky 7-10#

The first question after a kernel-level bug lands is whether the interface is even reachable. For AF_ALG on RHEL-family kernels, the answer is reliably yes. The relevant config:

text
CONFIG_CRYPTO_USER_API=y
CONFIG_CRYPTO_USER_API_AEAD=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y

Because CRYPTO_USER_API_AEAD is =y and not =m, algif_aead and af_alg are compiled into vmlinuz. They are not loadable modules on the running kernel. rmmod does not apply. A modprobe blacklist is a no-op against the running image. It only affects future kernel rebuilds or a kernel where these are modular. On RHEL-family hosts in production, the only mitigations that bite the running kernel are userspace cuts and the kernel patch + reboot.

The patch is a revert of the 2017 in-place AEAD optimization. Upstream commit a664bf3d603d touches crypto/algif_aead.c, crypto/af_alg.c, crypto/algif_skcipher.c, and include/crypto/if_alg.h. The header layout change makes this a non-trivial kpatch build for Red Hat and TuxCare KernelCare. Watch errata.redhat.com for kpatch-patch-5_14_0-* (RHEL 9) and kpatch-patch-6_12_0-* (RHEL 10). Expect a 1 to 3 week window from disclosure. For EL7 there is no kpatch path at all.

Defense-in-Depth Ladder#

There are five rungs of defense for AF_ALG-class bugs. Every one has failure modes. The point of layering them is that the conditions that defeat the rungs above the LD_PRELOAD shim are not the same conditions that defeat the shim, which is what makes the shim a viable primary defense, not just a backup.

The pattern across rungs 1, 2, 3, and 5 is the same: each one fails under routine operator reality. Vendors have not shipped yet. The kernel was built with builtin crypto. The threat surface includes a cron job, a sshd login shell, a podman container payload running its own pid 1. Per-service seccomp policy is operationally heavy and never finishes rolling out before the next CVE. The shim's failure modes (static binaries, inline-asm syscall instructions, SUID secure-exec stripping LD_PRELOAD) are attacker engineering territory. That asymmetry is the case for deploying the shim first and the other rungs as available.

When this is your primary defense: vendor kpatch is not out yet (zero-day window); it is out, but you cannot reboot the host right now; the kernel has algif_aead builtin so the modprobe blacklist is a no-op; your threat surface includes anything outside systemd; or you do not have the operational bandwidth to write per-service seccomp policy for every daemon on every host.

Where the shim fails is attacker engineering territory. Where the other rungs fail is routine operator reality. That asymmetry is the entire case for deploying it first.

The LD_PRELOAD Shim#

no-afalg.so is a single-file C interposer. It links -ldl and intercepts two libc entry points, returning EPERM for any AF_ALG socket and falling through to the real call for everything else. The constructor uses the documented dlsym(RTLD_NEXT, ...) idiom; if dlsym fails (which it should not, but the cost of paranoia is one branch) the wrapper falls through to a direct syscall(SYS_socket, ...) so legitimate sockets still work. Every block emits one syslog line to LOG_AUTHPRIV.

c
int socket(int domain, int type, int protocol)
{
    if (domain == AF_ALG) {
        log_block("socket", domain);
        errno = EPERM;
        return -1;
    }
    if (real_socket)
        return real_socket(domain, type, protocol);
    return (int)syscall(SYS_socket,
                        (long)domain, (long)type, (long)protocol);
}

Build is one gcc line, no autoconf, no makefile. The shim is deliberately small. The entire C source is around 100 lines including the syslog helper and the architecture guard. That budget is intentional: under /etc/ld.so.preload the .so loads into every dynamic-linked process on the host, and a 30 KB symbol table on every cron job and login shell would be embarrassing.

bash
gcc -shared -fPIC -O2 -Wall -Wextra \
    -o /usr/lib64/no-afalg.so no-afalg.c -ldl

echo /usr/lib64/no-afalg.so > /etc/ld.so.preload
systemctl restart sshd     # new login sessions inherit the preload

# verify
python3 -c 'import socket; socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0)'
# expect: PermissionError [Errno 1] Operation not permitted

What the shim deliberately does NOT do

It does not wrap syscall(2). Reading six long varargs unconditionally is undefined behaviour, and the bypasses it would catch ( syscall(SYS_socket, AF_ALG, ...) and inline-asm syscall instruction) are also unblockable from userspace by any other means. Pair with seccomp at the systemd or container-runtime level for that surface, and with the kernel patch for the complete close.

The shim also does not auto-enable on RPM install. Wiring /etc/ld.so.preload from a %post would brick the host on any broken upgrade. A missing or wrong-arch .so on every dynamic-linked binary makes the login shell fail to start. Activation is an explicit operator step:

bash
sudo /usr/sbin/copyfail-shim-enable      # smoke-tests, then writes /etc/ld.so.preload
sudo /usr/sbin/copyfail-shim-disable     # reverses it

The enable helper does LD_PRELOAD=$shim /bin/true first; if the .so cannot be loaded (broken build, wrong arch, missing dependency) the helper refuses to update ld.so.preload rather than risk locking the operator out. The corresponding %preun on full erase scrubs ld.so.preload before the .so is removed, to avoid the symmetric brick where rpm deletes the .so out from under a live preload entry and every subsequent dynamic invocation of /bin/sh fails.

Install via RPM

bash
sudo curl -sSL https://rfxn.github.io/copyfail/copyfail.repo \
  -o /etc/yum.repos.d/copyfail.repo
sudo dnf install -y afalg-defense
sudo /usr/sbin/copyfail-shim-enable

# auditor only, no LD_PRELOAD (for hot infrastructure)
sudo dnf install -y afalg-defense-auditor

One repo definition works for EL8, EL9, and EL10: $releasever and $basearch expand per host. Both the RPMs (gpgcheck=1) and the repodata (repo_gpgcheck=1) are signed by the Copyfail Project Signing Key (fingerprint 6001 1CDC EA2F F52D 975A FDEE 6D30 F32C D5E8 0F80); cross-check on first import.

EL7 has no signed RPM

EL7 is past EOL; the repo ships EL8/EL9/EL10 only. EL7 operators should build no-afalg.c from source against EL7's gcc 4.8 / glibc 2.17 and drop the resulting /usr/lib64/no-afalg.so in by hand. The auditor runs unmodified on EL7's Python 3.6.

The Auditor#

Patched build numbers are not the whole story. A patched host with a stale daemon process is still vulnerable on the daemon's syscalls. A drop-in file existing in /etc/systemd/system is not the same as a seccomp filter active on the running daemon. The source of truth is /proc/PID/status. The auditor exists because operator posture is not a single number; it is a posture across five attack-chain layers. The categories:

CategoryQuestion it answers
ENVKernel, distro, glibc, root status. Surfaces skip reasons up front so an unhelpful run does not look like a clean run.
KERNELIs the primitive actually reachable? AF_ALG socket open, cipher availability, live trigger probe (only check that produces a definitive VULN).
MITIGATIONIf the kernel is vulnerable, is anything stopping the bug? /etc/ld.so.preload content, shim live-block test, modprobe blacklist, /proc/modules ground truth, systemd RestrictAddressFamilies, drop-in freshness vs running daemon, runtime seccomp.
HARDENINGIf mitigation fails, what is the blast radius? SUID inventory, page-cache vs O_DIRECT integrity sample, getcap -r for non-SUID privilege.
DETECTIONWould we know if someone tried? auditd running with rules covering socket(38), recent IOC signals in auth.priv and audit.log.

Running it

bash
sudo copyfail-local-check                 # human-readable, only flags non-OK
sudo copyfail-local-check --json          # SIEM ingestion, posture.verdict
sudo copyfail-local-check --skip-trigger  # skip the live AF_ALG probe
sudo copyfail-local-check --category KERNEL,DETECTION
sudo copyfail-local-check --emit-remediation  # bash script of suggested fixes

Read-only by design: writes only to mkdtemp() sentinels, never modifies /usr/bin or /etc, runs unprivileged where it can (a few checks degrade gracefully without root). The optional trigger probe targets a freshly- created sentinel file in a private tempdir; it does not corrupt /usr/bin/su or anything else you would notice. Python 3.6+ stdlib-only, with a ctypes fallback for splice(2) on pre-3.10 Pythons so it runs unmodified from EL7 (Python 3.6) through EL10 (Python 3.12).

JSON for the SIEM

Fleet rollouts should consume posture.verdict, not the human report. The verdict is one of patched, kernel_likely_safe, inconclusive, vulnerable_kernel_userspace_mitigated, or vulnerable . The shape is designed so a fleet console can render a per-host posture row without re-implementing verdict logic over the raw checks.

json
{
  "posture": {
    "verdict": "vulnerable_kernel_userspace_mitigated",
    "layers": {
      "kernel_patched":      "missing",
      "af_alg_unreachable":  "missing",
      "modprobe_blacklist":  "missing",
      "ld_preload_shim":     "ok",
      "systemd_restriction": "missing",
      "user_service_dropin": "missing",
      "seccomp_runtime":     "skipped",
      "auditd_running":      "ok",
      "audit_rule_af_alg":   "ok"
    }
  }
}

Exit codes for automation: 0 clean, 1 tool error, 2 VULNERABLE, 3 vulnerable kernel + at least one userspace mitigation active, 4 hardening recommendations only. Wire JSON into existing monitoring; exit 2 or 3 should page; exit 4 is informational.

Indicators of Compromise#

AF_ALG socket creation by a tenant uid on a hosting node is a near-perfect IOC for this exploit family. Normal hosting workloads do not touch AF_ALG. Web servers, mail daemons, PHP-FPM workers, cron jobs, login shells, none of them open socket(38, ...) in normal operation. A single hit warrants investigation; multiple hits in a short window from the same uid is a confirmed exploit attempt. Three audit rules cover the high-signal surface:

text
# /etc/audit/rules.d/90-copyfail.rules
-a always,exit -F arch=b64 -S socket -F a0=38 -k afalg_attempt
-a always,exit -F arch=b64 -S splice -F auid>=1000 -k splice_tenant
-a always,exit -F arch=b64 -S execve -F path=/usr/bin/su -F success=0 -k su_denied

The first rule fires on any AF_ALG socket creation, period . That is what the auditor reads back through audit_rule_af_alg. The second narrows to splice() invocations from non-system uids, which is where the page-cache write actually fires. The third catches the “tried to plant /usr/bin/su but execve failed” signal that follows a botched plant.

Pair the auditd rules with the LD_PRELOAD shim's syslog stream. Each blocked AF_ALG attempt logs a structured line to LOG_AUTHPRIV:

text
no-afalg[12345]: blocked AF_ALG (domain=38) via socket uid=1234 euid=1234 pid=12345
no-afalg[12378]: blocked AF_ALG (domain=38) via socketpair uid=1234 euid=1234 pid=12378

On RHEL these go to /var/log/secure, on Debian to /var/log/auth.log. An audit rule firing without a corresponding shim block in the same window is the high-signal case. It means someone bypassed the userspace shim deliberately, which is itself evidence of an actor who already knew the shim was there.

Page-cache integrity scan

Run a 5-minute cron that hashes a small list of privilege-relevant files via two paths: a normal POSIX read (which sees the page cache) and an O_DIRECT read (which bypasses it). Divergence is high-fidelity: the page cache is lying to disk while an attack is in flight. The auditor ships a sample of this; the full list of files we hash is /etc/passwd, /etc/shadow, /etc/group, /etc/sudoers, /etc/security/access.conf, /etc/pam.d/{su,sshd,login}, /etc/nsswitch.conf, and /etc/ssh/sshd_config. Caveat: the attacker can posix_fadvise(POSIX_FADV_DONTNEED) to evict the corruption and erase the divergence signature, so this is high-signal when it fires but not a definitive all-clear when it does not.

No-AF_ALG Posture#

This is the part of the writeup that outlives the CVE.

AF_ALG exists for a good reason. It lets unprivileged code use the kernel crypto stack without a custom OpenSSL build, and on some embedded and IoT workloads that matters. On a hosting node, it does not. We have grepped years of strace and eBPF data across the fleet and found zero legitimate AF_ALG callers in any shared-hosting workload. Not in Apache, not in nginx, not in cPanel/InterWorx tooling, not in PHP-FPM, not in cron jobs, not in mail. The workloads that need crypto either run their own OpenSSL or call out to a hardware-accelerated path that does not go through algif_aead. The interface is, on hosting nodes, a free attack surface.

The forward standard for hosting nodes under our administration is no AF_ALG, layered: LD_PRELOAD shim system-wide via /etc/ld.so.preload; systemd RestrictAddressFamilies=~AF_ALG drop-ins on user@.service, sshd.service, and the container runtimes; modprobe blacklist as defense-in-depth against future kernel rebuilds; auditd rule on socket(38); and the kernel patch on the standard upgrade cadence. Removing any single rung does not change the security posture meaningfully. Layering them is the point.

Step 1: ship the shim and auditor

bash
# all hosts
sudo curl -sSL https://rfxn.github.io/copyfail/copyfail.repo \
  -o /etc/yum.repos.d/copyfail.repo
sudo dnf install -y afalg-defense
sudo /usr/sbin/copyfail-shim-enable
sudo systemctl restart sshd        # new sessions inherit the preload

# baseline audit
sudo copyfail-local-check --json > /var/log/copyfail-baseline.json

Step 2: systemd drop-ins on the load-bearing units

bash
# /etc/systemd/system/user@.service.d/no-afalg.conf
# (and sshd.service.d, containerd.service.d, podman.service, etc.)
[Service]
RestrictAddressFamilies=~AF_ALG
SystemCallArchitectures=native

systemctl daemon-reload
systemctl restart user@.service sshd
# verify the running daemon picked up the seccomp filter:
grep Seccomp /proc/$(pidof sshd | awk '{print $1}')/status   # expect 2

Step 3: auditd rules and modprobe blacklist

bash
# auditd
cat > /etc/audit/rules.d/90-copyfail.rules <<'EOF'
-a always,exit -F arch=b64 -S socket -F a0=38 -k afalg_attempt
-a always,exit -F arch=b64 -S splice -F auid>=1000 -k splice_tenant
-a always,exit -F arch=b64 -S execve -F path=/usr/bin/su -F success=0 -k su_denied
EOF
augenrules --load
auditctl -l | grep afalg_attempt   # verify loaded

# modprobe blacklist (defense-in-depth on builtin kernels)
cat > /etc/modprobe.d/99-no-afalg.conf <<'EOF'
install algif_aead /bin/false
install algif_skcipher /bin/false
install algif_hash /bin/false
install authenc /bin/false
install authencesn /bin/false
install af_alg /bin/false
EOF

Step 4: kernel patch + reboot when errata lands

Apply on the accelerated cycle when kpatch-patch-5_14_0-* (RHEL 9) or kpatch-patch-6_12_0-* (RHEL 10) ships in errata. Watch errata.redhat.comand TuxCare KernelCare. Once the kernel is patched, the userspace rungs become defense-in-depth rather than load-bearing. Leave them in place; they are cheap, and the next AF_ALG-class bug lands the same way.

Priority Order#

Immediate

Install afalg-defense on every EL8/EL9/EL10 host. Run copyfail-shim-enable. Restart sshd so login sessions inherit the preload.
Build no-afalg.c from source on EL7 hosts. The auditor is portable and needs no rebuild.
Run copyfail-local-check --json fleet-wide. Triage by posture.verdict. Exit 2 pages on-call.
Deploy the three auditd rules. Tail afalg_attempt hits into the SIEM; pair with the shim's LOG_AUTHPRIV stream.

Forward

Ship the systemd RestrictAddressFamilies=~AF_ALG drop-in on user@.service, sshd.service, and every container runtime via Ansible. Pair with SystemCallArchitectures=native.
Deploy the modprobe blacklist as belt-and-suspenders against kernel rebuilds where algif_aead is modular.
Apply the kernel patch on accelerated cycle when errata lands. Leave the userspace rungs in place; the next AF_ALG-class bug lands the same way.
Standardize no-AF_ALG as the default posture on hosting nodes. Tenants do not legitimately use AF_ALG, and the interface is a free attack surface.

Detect

Page on every afalg_attempt audit hit on a tenant uid. Hosting workloads do not legitimately open AF_ALG sockets.
Run the page-cache integrity scan every five minutes against /etc/passwd, /etc/sudoers, and PAM configs. Divergence pages immediately.
Watch for audit hits without a corresponding shim block in the same window. That gap means someone deliberately bypassed userspace.
Re-run copyfail-local-check daily; alert on any host whose verdict regresses from vulnerable_kernel_userspace_mitigated to vulnerable.

The kernel patch is the actual fix. Everything else is buying the window between disclosure and the next maintenance reboot. On hosting nodes that window is sometimes weeks, sometimes longer than the CVE is hot for, and the cost of the layered posture is one .so on ld.so.preload, a few systemd drop-ins, and a cron. We are out of excuses for shipping AF_ALG to tenants by default. If you want to compare posture data, push back on any of the architectural calls, or share what you saw on your fleet, my contact is below.

References#

Disclosure & Public Research

Upstream Fix & Vendor Trackers

Userspace Defense (this work)

Kernel Crypto API & Background

Live Kernel Patching

Source and signed RPMs at rfxn.github.io/copyfail (GPG fingerprint 6001 1CDC EA2F F52D 975A FDEE 6D30 F32C D5E8 0F80); LD_PRELOAD shim and auditor are GPL v2. Additional IOCs, variant samples, or fleet posture data welcome via Keybase or email.