Your AI Server Is Leaking API Keys and Private Conversations to Anyone Who Asks
A critical flaw in Ollama lets attackers trick the server into spilling secrets from memory — including API keys, system prompts, and other users' chats.
This article is written for general audiences — no security background needed. For the full technical analysis with CVE details, affected versions, and code-level breakdown, visit Intel Reports.
CRITICAL 9.1 CVE-2026-7482 · Memory Corruption · Cross-Platform
Your AI Server Is Leaking API Keys and Private Conversations to Anyone Who Asks
The Threat in Plain Terms
Imagine you run a self-hosted AI assistant for your company — the thing your team uses to draft emails, analyze contracts, or answer internal questions. Your API keys for connecting to other services live on that server. So do the secret instructions that define how your AI behaves. And so do the live conversations your colleagues are having with it right now, in memory, waiting to be processed.
A newly disclosed vulnerability in Ollama means an attacker who can reach your server — even over the public internet if you've exposed it — can potentially read all of that. Not by breaking any passwords or bypassing any login screen. By uploading a single, carefully crafted fake AI model file.
Here's how it works in plain terms: when Ollama's server receives a new AI model file to load, it trusts what the file says about its own contents. A malicious file can lie — claiming its data is much bigger than it actually is. When Ollama goes to process that file, it reaches past the end of the data it was given and starts reading from wherever its memory happens to land next. That "wherever" can include environment variables holding database passwords, the API keys your server loaded at startup, the system prompt you paid a consultant to craft, and fragments of conversations other users are actively having on the same server. The attacker then quietly exports this stolen memory by pushing it out as a new "model" artifact to a server they control. It's a smash-and-grab on your server's brain, and it leaves almost nothing behind.
The Technical Detail That Matters
Root cause: In
fs/ggml/gguf.go and server/quantization.go, the WriteTo() function during GGUF model quantization performs no bounds validation between attacker-supplied tensor offset/size fields and the actual underlying file length. A malformed GGUF file with tensor_data_offset + tensor_size > file_length causes the server to read past the allocated heap buffer into adjacent heap memory.Attack surface: The
/api/create endpoint accepts the malicious GGUF payload; exfiltration is completed via /api/push, making the full exploit chain unauthenticated by default on most Ollama deployments.CVSS 3.1 score: 9.1 (CRITICAL) — Network-adjacent, no privileges required, no user interaction.
Who Is Actually at Risk Here?
Ollama has been downloaded over 50 million times and sits at the center of the boom in self-hosted AI. Developers use it to run models like Llama, Mistral, and Gemma on their own machines or company servers. Startups build entire products on top of it. IT teams deploy it internally to keep sensitive data off of cloud AI services — often for compliance reasons. The bitter irony of this vulnerability is that the very people trying to keep their data private by running AI locally may be the ones most exposed.
By default, Ollama binds to localhost — meaning an attacker would need to already be on your network. But a significant portion of real-world Ollama deployments are intentionally or accidentally exposed to the internet, and security researchers have historically found thousands of such instances with a basic internet scan. If you've configured Ollama with a public-facing port, your exposure window has been wide open.
"The attack chain here is elegant in the worst possible way — you upload a file, the server hands you back secrets it didn't know it was giving away, and you push them somewhere else. Two API calls."
Exploited in the Wild? What We Know.
As of publication, no active exploitation has been confirmed. There are no known victim organizations or campaigns tied to CVE-2026-7482 at this time. However, security teams should not treat the absence of confirmed exploitation as an absence of risk. The attack surface is massive, the exploit concept is straightforward for any researcher who reads this advisory, and Ollama deployments frequently run with elevated system privileges — making leaked memory especially rich.
The vulnerability was responsibly disclosed to the Ollama security team, who issued a patch in version 0.17.1. The disclosure timeline and the name of the discovering researcher had not been made public at the time of writing. We will update this article as attribution details emerge.
What To Do Right Now — 3 Steps
-
Update Ollama to version 0.17.1 or later — immediately.
Runollama --versionto check what you have. If you installed via the official installer on macOS or Linux, run the installer again or usecurl -fsSL https://ollama.com/install.sh | shto pull the latest version. Docker users should pullollama/ollama:0.17.1and redeploy their containers. -
Audit who can reach your Ollama server.
Runss -tlnp | grep 11434(Linux) or check your firewall rules to confirm Ollama is not exposed to the public internet. If it is, take it offline or place it behind a VPN or authenticated reverse proxy immediately. Do not rely on "no one knows the URL" as a security control. -
Rotate any secrets that were accessible on the server.
If your Ollama server ran with API keys, database credentials, or other sensitive environment variables set — for any period before patching — treat those secrets as potentially compromised. Rotate them now, check your service logs for unexpected/api/pushcalls to external endpoints, and audit whether any GGUF files were uploaded by unexpected sources via/api/create.
CVE-2026-7482 · CVSS 9.1 (Critical) · Fixed in Ollama 0.17.1 · Tags: heap-buffer-overflow, out-of-bounds-read, gguf-parser, information-disclosure
The technical analysis covers the exact vulnerability mechanism, affected code paths, attack chain, detection methods, and full remediation guide.
Read technical analysis →Encrypt your traffic against the threats we explain here.
Stop credential theft. Password manager from Nord Security.
Travel privately. eSIM data for 150+ countries, 10% off.
Affiliate links — commission earned at no cost to you.
You've read 2 free articles this session.
Get the weekly mobile threat briefing — CVEs, exploit research, and security intelligence. Free, no spam.
No spam. Unsubscribe anytime.