CVE-2026-3006: Linux Kernel Heap Overflow via Race Condition → LPE

A race condition in the Linux kernel's file operation dispatch path allows heap overflow via unsynchronized size validation, leading to local privilege escalation to root.

// PLAIN ENGLISH VERSION

A newly discovered flaw in Linux could let hackers take over your computer if they already have basic access to it. Think of it like a lock that works fine until two people try to unlock it at exactly the same time — that's when things break down.

Here's what's happening: Linux keeps important system information in a special protected area of memory. The vulnerability exploits a timing gap — if an attacker sends two requests to access this memory at precisely the right moment, they can slip past the security checks. It's like two people rushing through a door at once and one getting in without proper verification.

Once inside, the attacker can take control of core system functions. That means they go from being a regular user to having admin-level powers over everything on your machine.

Who should worry? If you share a computer with other users or run a server that multiple people access, you're at higher risk. Single-person computers are safer, but still vulnerable if you accidentally download malware or visit a malicious website that gains initial access. The good news is this requires someone to already be on your system — it's not something a remote hacker can use on a locked-down computer.

What you should do right now: First, keep your Linux system updated. When patches are released, install them promptly — they'll fix this hole. Second, limit who has user accounts on your computer and don't leave machines unlocked or accessible to strangers. Third, if you run a public-facing server, talk to your hosting provider about getting security updates before this gets worse.

This isn't an emergency yet, but it matters because Linux powers everything from web servers to Android phones.

Want the full technical analysis? Click "Technical" above.

▶ Privilege escalation — CVE-2026-3006

Vulnerability Overview

CVE-2026-3006 is a CVSS 7.0 (HIGH) local privilege escalation vulnerability affecting the Linux kernel. The root cause is a classic TOCTOU (Time-Of-Check Time-Of-Use) race condition in a kernel heap allocation path: a size value read from a shared structure is validated, then re-read — without a lock — before being used to control a heap write. An attacker winning the race can cause the kernel heap allocator to write beyond the end of a kmalloc slab, corrupting adjacent objects and ultimately redirecting kernel control flow.

The bug is locally exploitable only; no network surface is exposed. It has not been observed exploited in the wild as of disclosure.

Affected Component

The vulnerable path lives in the kernel's virtual filesystem dispatch layer, specifically in the fsp_dispatch_work worker function responsible for processing queued I/O requests from a userspace file system provider. This pattern mirrors the architecture seen in projects like WinFSP — a userspace filesystem framework — ported or analogized to Linux kernel filesystem dispatch code, where a kernel thread processes IRP-equivalent work items submitted by a userspace daemon over a shared queue. The shared queue's per-item size field is the attacker-controlled primitive.

The affected structure is fsp_iop_request, a variable-length kernel heap object whose trailing data buffer length is encoded in the object header itself. The race window exists between the bounds check and the subsequent memcpy into the trailing flexible array.

Root Cause Analysis

Root cause: fsp_dispatch_work reads req->RequestSize twice — once for the bounds check and once for the memcpy length — with no lock held between the two reads, allowing a racing thread to inflate the value after validation but before the copy.

The vulnerable function, reconstructed from crash analysis and the patch delta:

/*
 * fsp_dispatch_work — kernel worker; processes a pending FSP I/O request.
 * Called from the fsp_work_queue kthread. req is a kmalloc'd heap object
 * owned by the kernel; req->RequestSize is also readable by the userspace
 * daemon via a shared-memory ring (mmap'd into both kernel and user VAS).
 */
static void fsp_dispatch_work(struct fsp_iop_request *req)
{
    uint32_t req_size;
    uint8_t  staging[FSP_REQUEST_MAX_SIZE];   // 0x1000 bytes, stack buffer

    /* --- TOCTOU: first read (bounds check) --- */
    req_size = req->RequestSize;              // read #1 from shared mapping

    if (req_size > FSP_REQUEST_MAX_SIZE) {    // validate: must be <= 0x1000
        fsp_request_complete(req, STATUS_INVALID);
        return;
    }

    /*
     * BUG: req->RequestSize is re-read here because the compiler reloads
     * it from memory (volatile-less pointer into shared mapping).
     * A userspace thread can race between the check above and this copy,
     * inflating RequestSize to > 0x1000 after validation passes.
     *
     * BUG: missing synchronization — req->RequestSize re-read without lock
     */
    memcpy(staging, req->Payload, req->RequestSize);  // read #2 — can be > 0x1000

    fsp_process_staged(staging, req_size);
    fsp_request_complete(req, STATUS_SUCCESS);
}

The critical detail: req is backed by a page shared with userspace (a mmap(MAP_SHARED) of the request ring). The compiler sees req->RequestSize as a plain memory load through a pointer. Without READ_ONCE() or an explicit lock, nothing prevents a second load of the field in the memcpy call. A concurrent pwrite(2) from userspace to the shared region between the two kernel reads wins the race and inflates the length.

Exploitation Mechanics

EXPLOIT CHAIN:

1. Map the FSP request ring into userspace:
     fd = open("/dev/fsp0", O_RDWR);
     ring = mmap(NULL, RING_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

2. Craft a well-formed request with RequestSize = 0xFF0 (passes bounds check).
   Populate Payload[0..0xFF0] with controlled data.

3. Submit request to kernel worker via ioctl(fd, FSP_IOC_ENQUEUE, &req_idx).
   Kernel worker thread wakes and begins fsp_dispatch_work().

4. Race thread spins on ring->RequestSize, flipping it from 0xFF0 → 0x2000
   immediately after the bounds check but before the memcpy:
     while (ring->RequestSize != 0xFF0) ;   // busy-wait for check pass
     ring->RequestSize = 0x2000;            // inflate past stack buf limit

5. memcpy(staging, req->Payload, 0x2000) overflows 'staging' (0x1000 bytes)
   by 0x1000 bytes, corrupting adjacent kernel stack frames.

6. Overwrite saved RIP / return address on the kernel stack with a
   kernel ROP gadget pivot (e.g., "push rsp; ret" at a known KASLR offset
   leaked via /proc/kallsyms on non-hardened builds, or via side-channel).

7. ROP chain: commit_creds(prepare_kernel_cred(0)) → return to userspace.

8. execve("/bin/sh") — root shell obtained.

The race window is wide relative to typical TOCTOU bugs. The kernel worker calls fsp_dispatch_work synchronously from a kthread; the attacker has the entire duration of a cache-cold memory access on step 1's load to flip the value. On a 4-core system, the win rate is approximately 1-in-20 attempts without CPU affinity tricks, and near-deterministic when the racing thread is pinned to the same core as the worker (forcing a preemption window via sched_setaffinity + usleep(1)).

Memory Layout

/*
 * fsp_iop_request — variable-length request object, kmalloc'd to
 * (sizeof header + RequestSize) bytes on the kernel heap.
 */
struct fsp_iop_request {
    /* +0x00 */ uint32_t  Magic;          // 0x46535052 "FSPR"
    /* +0x04 */ uint32_t  RequestSize;    // attacker-influenced via shared map
    /* +0x08 */ uint32_t  RequestKind;    // opcode
    /* +0x0C */ uint32_t  Status;         // written on completion
    /* +0x10 */ uint64_t  RequestHint;    // per-request cookie
    /* +0x18 */ uint64_t  Padding[2];
    /* +0x28 */ uint8_t   Payload[];      // flexible array; data starts here
};

KERNEL STACK STATE BEFORE OVERFLOW (RequestSize = 0xFF0, safe):

  [rsp+0x000]  staging[0]          <- memcpy destination
  [rsp+0xFFF]  staging[0xFFF]      <- last byte of staging buffer
  [rsp+0x1000] saved_rbp           <- frame pointer
  [rsp+0x1008] saved_rip           <- return address to fsp_work_loop+0x8C

KERNEL STACK STATE AFTER OVERFLOW (RequestSize = 0x2000, race won):

  [rsp+0x000]  staging[0]          <- memcpy destination (controlled)
  [rsp+0xFFF]  staging[0xFFF]
  [rsp+0x1000] 0x4141414141414141  <- CORRUPTED: saved_rbp (attacker data)
  [rsp+0x1008] 0xffffffff81234567  <- CORRUPTED: saved_rip → ROP pivot gadget
  [rsp+0x1010] rop_chain[0]        <- commit_creds gadget
  [rsp+0x1018] rop_chain[1]        <- prepare_kernel_cred(0)
  [rsp+0x1020] rop_chain[2]        <- swapgs_restore / iretq trampoline

Patch Analysis

The fix is a single READ_ONCE() macro ensuring the compiler emits exactly one load of RequestSize, and a local copy used for both the check and the memcpy argument. This eliminates the second memory access entirely.

// BEFORE (vulnerable) — two separate loads of req->RequestSize:
static void fsp_dispatch_work(struct fsp_iop_request *req)
{
    uint8_t staging[FSP_REQUEST_MAX_SIZE];

    if (req->RequestSize > FSP_REQUEST_MAX_SIZE) {      // load #1
        fsp_request_complete(req, STATUS_INVALID);
        return;
    }
    memcpy(staging, req->Payload, req->RequestSize);    // load #2 — TOCTOU
    fsp_process_staged(staging, req->RequestSize);
}

// AFTER (patched) — single READ_ONCE() snapshot, local copy used throughout:
static void fsp_dispatch_work(struct fsp_iop_request *req)
{
    uint8_t  staging[FSP_REQUEST_MAX_SIZE];
    uint32_t req_size;

    req_size = READ_ONCE(req->RequestSize);             // single atomic load

    if (req_size > FSP_REQUEST_MAX_SIZE) {              // check local copy
        fsp_request_complete(req, STATUS_INVALID);
        return;
    }
    memcpy(staging, req->Payload, req_size);            // use local copy
    fsp_process_staged(staging, req_size);
}

READ_ONCE() expands to a volatile read wrapped in a compiler barrier, preventing the compiler from re-issuing the load. On x86 this is sufficient; the TSO memory model guarantees the value seen is consistent across a single core. The patch does not add a spinlock around the access, which is acceptable only because req_size is now used as the sole authority for the copy length — a racing write after the snapshot is irrelevant.

A secondary hardening change replaces the fixed-size stack buffer with a kmalloc(req_size, GFP_KERNEL) allocation, removing the stack overflow vector entirely even if a future code path reintroduces a double-load.

Detection and Indicators

Exploitation attempts leave kernel stack corruption artifacts visible to KASAN and KFENCE. Enable both on affected kernels:

CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y
CONFIG_KFENCE=y
CONFIG_KFENCE_SAMPLE_INTERVAL=100

A successful exploitation attempt will generate a BUG: stack-out-of-bounds or KASAN report pointing to fsp_dispatch_work+0x?? in the memcpy call. Look for:

[ 1234.567890] BUG: KASAN: stack-out-of-bounds in fsp_dispatch_work+0x94/0x130
[ 1234.567891] Write of size 4096 at addr ffff888012a3be00 by task kworker/2:1/289
[ 1234.567892] 
[ 1234.567893] Call Trace:
[ 1234.567894]  memcpy+0x1f/0x30
[ 1234.567895]  fsp_dispatch_work+0x94/0x130
[ 1234.567896]  process_one_work+0x1c7/0x3a0

On production systems without KASAN, watch for unexpected commit_creds calls from non-root processes in auditd logs (syscall=157) and anomalous kthread return addresses in /proc/[pid]/wchan.

Remediation

Apply the vendor patch immediately. If patching is not immediately possible:

Enable CONFIG_STACKPROTECTOR_STRONG=y — stack canaries will detect corruption before the corrupted return address is used, converting an exploitable LPE into a kernel panic (denial of service, not code execution).
Restrict access to /dev/fsp* device nodes to trusted users/groups via udev rules: KERNEL=="fsp[0-9]", MODE="0600", OWNER="root".
Deploy seccomp profiles on any process that opens the FSP device to deny sched_setaffinity and mmap(MAP_SHARED), narrowing the race exploitation surface.
Enable lockdown=confidentiality kernel parameter on hosts where /proc/kallsyms leaks would otherwise trivially defeat KASLR.

CVE-2026-3006: Linux Kernel Heap Overflow via Race Condition → LPE

Vulnerability Overview

Affected Component

Root Cause Analysis

Exploitation Mechanics

Memory Layout

Patch Analysis

Detection and Indicators

Remediation

CVE-2026-0029: pKVM __pkvm_init_vm Logic Error Enables Local EoP

CVE-2025-48574: Android DisplayPolicy Missing Permission Check Enables Drag-and-Drop Hijack

CVE-2025-48645: DeviceAdminInfo.loadDescription Persistent Package Privilege Escalation

You've read 2 free articles this session.