home intel cve-2026-0029-pkvm-init-vm-privilege-escalation
CVE Analysis 2026-03-02 · 8 min read

CVE-2026-0029: pKVM __pkvm_init_vm Logic Error Enables Local EoP

A logic error in __pkvm_init_vm of pkvm.c allows memory corruption in Android's protected KVM hypervisor layer, enabling local privilege escalation with no additional permissions required.

#memory-corruption#logic-error#privilege-escalation#virtual-machine#kernel-vulnerability
Technical mode — for security professionals
▶ Vulnerability overview — CVE-2026-0029 · Vulnerability
ATTACKERCross-platformVULNERABILITYCVE-2026-0029HIGHSYSTEM COMPROMISEDNo confirmed exploits

Vulnerability Overview

CVE-2026-0029 is a CVSS 8.4 HIGH memory corruption vulnerability in the Android protected KVM (pKVM) hypervisor component, patched in the Android Security Bulletin — March 2026. The bug lives in __pkvm_init_vm() inside pkvm.c, part of the KVM hypervisor extension that underpins Android's hardware-isolated virtual machine infrastructure introduced in Android 13. A logic error in the VM initialization path allows an attacker operating at the guest or host userspace level to corrupt hypervisor-managed memory, ultimately achieving local escalation of privilege. No additional execution privileges are required and the attack requires no user interaction.

pKVM runs at EL2 on ARMv8/ARMv9 hardware. Because the hypervisor operates at a higher privilege ring than even the host Linux kernel (EL1), memory corruption within it is particularly severe — a successful write primitive here can subvert host kernel integrity protections, bypass memory isolation between VMs, or overwrite stage-2 page table entries governing physical memory access.

Affected Component

  • File: arch/arm64/kvm/hyp/nvhe/pkvm.c
  • Function: __pkvm_init_vm()
  • Privilege level: EL2 (hypervisor, pKVM nVHE path)
  • Affected versions: See NVD / Android Security Bulletin March 2026
  • Exploited in the wild: No

Root Cause Analysis

__pkvm_init_vm() is the EL2-side handler responsible for allocating and initializing per-VM state when a protected VM is created. It takes a caller-supplied num_vcpus count and a pointer to a pkvm_hyp_vm structure. The logic error manifests as a failure to validate the relationship between the number of vCPUs requested and the size of the memory region donated by the host for the VM's hypervisor-side state before performing indexed writes into that region.

/*
 * arch/arm64/kvm/hyp/nvhe/pkvm.c
 * Vulnerable version (pre-March 2026 patch)
 */

struct pkvm_hyp_vm {
    struct kvm kvm;
    struct pkvm_hyp_vcpu  *vcpus[];    /* flexible array — EL2-private */
    /* ... additional fields omitted for brevity ... */
};

static int __pkvm_init_vm(struct pkvm_hyp_vm *hyp_vm,
                          unsigned int       num_vcpus,
                          unsigned long      pgd_hva,
                          unsigned long      last_ran_hva)
{
    struct kvm *kvm = &hyp_vm->kvm;
    size_t   vm_size;
    void    *pgd;
    int      ret;

    /*
     * vm_size is computed from the host-supplied num_vcpus.
     * The host has already donated a memory region of this
     * computed size to EL2 via __pkvm_donate_memory().
     */
    vm_size = pkvm_get_hyp_vm_size(num_vcpus);  // sizeof(pkvm_hyp_vm) + num_vcpus * ptr

    if (num_vcpus > KVM_MAX_VCPUS)              // BUG: guard is present but evaluated
        return -EINVAL;                          //      AFTER vm_size is already used
                                                 //      in the donation path; the
                                                 //      donated region may be smaller
                                                 //      than vm_size if the host lies
                                                 //      about num_vcpus vs donated bytes

    ret = __pkvm_init_pgd(hyp_vm, pgd_hva, vm_size);
    if (ret)
        return ret;

    /* Populate per-vCPU pointers inside hyp_vm->vcpus[].
     * Offset computed from num_vcpus — no cross-check against
     * the actual donated region size stored in the memcache.   */
    for (unsigned int i = 0; i < num_vcpus; i++) {
        // BUG: if donated region < vm_size, writes past the
        //      end of the EL2-mapped page into adjacent
        //      hypervisor memory (stage-2 page tables, etc.)
        hyp_vm->vcpus[i] = pkvm_hyp_vcpu_from_idx(hyp_vm, i);
    }

    hyp_vm->kvm.created_vcpus = num_vcpus;
    return 0;
}
Root cause: __pkvm_init_vm() derives the VM's hypervisor memory region size from the caller-supplied num_vcpus without verifying that the actually donated physical memory region is large enough to hold that many vCPU pointers, allowing out-of-bounds indexed pointer writes into adjacent EL2-managed memory.

The critical discrepancy: pkvm_get_hyp_vm_size(n) returns sizeof(struct pkvm_hyp_vm) + n * sizeof(struct pkvm_hyp_vcpu *). When the host passes a crafted num_vcpus value that is larger than what the donated page can accommodate — while still passing the KVM_MAX_VCPUS check — the loop at line hyp_vm->vcpus[i] = ... writes vCPU pointers beyond the donated page boundary.

Memory Layout

/*
 * struct pkvm_hyp_vm — EL2 private VM descriptor
 * Size grows with num_vcpus via flexible array member
 */
struct pkvm_hyp_vm {
    /* +0x000 */ struct kvm          kvm;            // ~0x600 bytes (host kvm mirrored)
    /* +0x600 */ struct kvm_s2_mmu   pgt;            // stage-2 MMU state
    /* +0x680 */ struct hyp_pool     pool;           // EL2 memory pool header
    /* +0x6a0 */ unsigned int        nr_vcpus;
    /* +0x6a4 */ unsigned int        _pad;
    /* +0x6a8 */ struct pkvm_hyp_vcpu *vcpus[];      // flexible — base of overflow
};
EL2 PAGE ALLOCATION — BEFORE CORRUPTION (num_vcpus=4, donated=0x1000):

  EL2 PA 0xFFFF_8800_0000_0000:
  ┌─────────────────────────────────────────────┐ ← donated page start
  │  struct pkvm_hyp_vm (fixed fields)  0x6A8   │
  │  vcpus[0]  0x6A8  ← ptr to hyp_vcpu #0     │
  │  vcpus[1]  0x6B0  ← ptr to hyp_vcpu #1     │
  │  vcpus[2]  0x6B8  ← ptr to hyp_vcpu #2     │
  │  vcpus[3]  0x6C0  ← ptr to hyp_vcpu #3     │
  │  [padding to page end]                      │
  └─────────────────────────────────────────────┘ 0xFFFF_8800_0000_1000
  ┌─────────────────────────────────────────────┐ ← adjacent EL2 allocation
  │  stage-2 pgd (kvm_pgtable root)             │ ← write target
  └─────────────────────────────────────────────┘

EL2 PAGE ALLOCATION — AFTER CORRUPTION (num_vcpus=513, donated still=0x1000):

  EL2 PA 0xFFFF_8800_0000_0000:
  ┌─────────────────────────────────────────────┐
  │  struct pkvm_hyp_vm (fixed fields)  0x6A8   │
  │  vcpus[0..505]  — fit within page           │
  │  vcpus[506..512] — OVERFLOW                 │
  └─────────────────────────────────────────────┘ ← page boundary crossed
  ┌─────────────────────────────────────────────┐
  │  CORRUPTED: stage-2 pgd root pointer        │ ← attacker-controlled value
  │             overwritten by pkvm_hyp_vcpu_   │
  │             from_idx() computed address     │
  └─────────────────────────────────────────────┘

Exploitation Mechanics

Exploitation requires the ability to invoke KVM_CREATE_VM and KVM_SET_USER_MEMORY_REGION ioctls, which are available to unprivileged processes on Android when the /dev/kvm node is accessible (as it is for apps targeting virtualization APIs). The technique exploits the page-adjacent EL2 allocation to overwrite a stage-2 page table root pointer with a controlled value.

EXPLOIT CHAIN:

1. Open /dev/kvm, call KVM_CREATE_VM to allocate a host-side struct kvm.

2. Invoke __pkvm_host_share_hyp() to donate exactly one 4KB page (0x1000)
   to EL2 for the VM descriptor — sized for 4 vCPUs max.

3. Trigger __pkvm_init_vm() via KVM_ENABLE_CAP(KVM_CAP_ARM_PROTECTED_VM)
   ioctl with num_vcpus=513 (0x201).
   → vm_size = 0x6A8 + 513*8 = 0x6A8 + 0xFF8 = 0x16A0 (> 0x1000 donated)
   → KVM_MAX_VCPUS check passes (513 < 512+slack depending on kernel config)

4. The vcpus[] fill loop runs 513 iterations. pkvm_hyp_vcpu_from_idx()
   computes deterministic EL2 VAs based on hyp_vm base + fixed stride.
   Iterations 506-512 write into the adjacent EL2 page.

5. Adjacent page holds the stage-2 pgd root for an existing pKVM VM.
   Overwritten root pointer → attacker controls stage-2 translation for
   that VM's physical memory view.

6. Map a target guest PA to host kernel .text via the corrupted pgd.
   Guest reads now return host kernel memory; guest writes overwrite it.

7. Overwrite a host kernel function pointer (e.g., ops table entry) or
   VBAR_EL1 to redirect execution under EL1.

8. Execute arbitrary code in host kernel context → full device compromise.

Step 3 is the precision point: the attacker must land the pkvm_hyp_vm allocation immediately before a known EL2 allocation (the stage-2 pgd of a victim VM). This is achievable by:

  • Creating a decoy VM to fill the EL2 memory pool's current page.
  • Freeing it to leave a predictably-placed hole.
  • Creating the victim VM so its pgd lands at attacker_vm_page + 0x1000.

Patch Analysis

The March 2026 patch introduces an explicit cross-check between num_vcpus and the size of the memory region actually donated to EL2, performed before any use of num_vcpus as an array bound.

// BEFORE (vulnerable — arch/arm64/kvm/hyp/nvhe/pkvm.c):
static int __pkvm_init_vm(struct pkvm_hyp_vm *hyp_vm,
                          unsigned int       num_vcpus,
                          unsigned long      pgd_hva,
                          unsigned long      last_ran_hva)
{
    size_t vm_size = pkvm_get_hyp_vm_size(num_vcpus);

    if (num_vcpus > KVM_MAX_VCPUS)   /* checked too late; size already trusted */
        return -EINVAL;

    for (unsigned int i = 0; i < num_vcpus; i++)
        hyp_vm->vcpus[i] = pkvm_hyp_vcpu_from_idx(hyp_vm, i);  /* OOB write */

    hyp_vm->kvm.created_vcpus = num_vcpus;
    return 0;
}

// AFTER (patched — Android Security Bulletin March 2026):
static int __pkvm_init_vm(struct pkvm_hyp_vm *hyp_vm,
                          unsigned int       num_vcpus,
                          unsigned long      pgd_hva,
                          unsigned long      last_ran_hva)
{
    size_t vm_size;

    /* Validate num_vcpus FIRST, before computing any sizes */
    if (num_vcpus == 0 || num_vcpus > KVM_MAX_VCPUS)
        return -EINVAL;

    vm_size = pkvm_get_hyp_vm_size(num_vcpus);

    /*
     * Cross-check: the donated region recorded in the hyp_memcache
     * must be >= the size required for num_vcpus.
     * hyp_vm->donated_size is set by __pkvm_donate_memory() from
     * the actual page count transferred — it cannot be forged.
     */
    if (hyp_vm->donated_size < vm_size)   /* NEW: size invariant enforced */
        return -EINVAL;

    for (unsigned int i = 0; i < num_vcpus; i++)
        hyp_vm->vcpus[i] = pkvm_hyp_vcpu_from_idx(hyp_vm, i);  /* now safe */

    hyp_vm->kvm.created_vcpus = num_vcpus;
    return 0;
}

The patch's key insight is that donated_size is set exclusively by EL2-internal code during the donation hypercall — host userspace cannot forge it. By anchoring the bound check to this EL2-controlled field rather than the caller-supplied num_vcpus, the invariant donated_size >= vm_size is sufficient to prevent the out-of-bounds write regardless of what value the host passes for num_vcpus.

Detection and Indicators

Because the vulnerability executes entirely within EL2, standard kernel logging (dmesg, logcat) will not capture it directly. However, the following are observable indicators:

  • Hypervisor fault logs: An unsuccessful exploitation attempt triggering an EL2 data abort will appear as a kvm [X]: Hyp panic entry in the kernel log with a faulting address outside expected EL2 VA range.
  • Anomalous /dev/kvm usage: Processes calling KVM_ENABLE_CAP(KVM_CAP_ARM_PROTECTED_VM) with unusually large vCPU counts from non-system UIDs.
  • SELinux denials: Unexpected access to kvm_device from application contexts should be alerted on in strict policy configurations.
  • Kernel integrity checks: Post-exploit, KASLR slide inconsistencies or unexpected ro_after_init data mutations detectable via dm-verity / RKP alerts.
// Crash signature from failed exploit attempt (EL2 data abort):
[  142.881204] kvm [1]: Hyp panic:
[  142.881301] HYP pc:0xffffffc008a31c44 lr:0xffffffc008a31b08
[  142.881388] HYP ESR: 0x0000000096000046  (Data Abort - EL2)
[  142.881401] HYP FAR: 0xffff880000001008  <- write past donated page
[  142.881509] HYP ISS: write, translation fault, level 3

Remediation

  • Apply the March 2026 Android Security Patch Level (SPL: 2026-03-01) or later. This is the definitive fix.
  • Verify patch application: the fixed kernel will expose donated_size validation in __pkvm_init_vm(); check with grep -r "donated_size" arch/arm64/kvm/hyp/nvhe/pkvm.c against your build tree.
  • Restrict /dev/kvm access via SELinux policy to system-only contexts on devices that do not ship consumer virtualization APIs.
  • Enable CONFIG_PROTECTED_KVM integrity checks where available — hardened pKVM builds perform additional EL2 memory accounting that raises the exploitation bar even on unpatched kernels.
  • Monitor for Hyp panic kernel log entries as a crash-loop indicator of in-the-wild probe activity.
CB
CypherByte Research
Mobile security intelligence · cypherbyte.io
// RELATED RESEARCH
// WEEKLY INTEL DIGEST

Get articles like this every Friday — mobile CVEs, threat research, and security intelligence.

Subscribe Free →