all posts

ai agents

EchoLeak, CamoLeak, and the GPT-5 7-vuln chain — prompt injection is shipping in named products

The 2025–2026 prompt-injection CVEs in Microsoft 365 Copilot, GitHub Copilot Chat, and ChatGPT. What changed, why "we'll fix it later" is no longer an answer, and what telemetry actually catches it.

By Akshay Sarode· January 22, 2026· 13 min readllmai-agentsobservabilitysecurity

EchoLeak, CamoLeak, and the GPT-5 7-vuln chain — prompt injection is shipping in named products

TL;DR. In 2025–2026 prompt injection moved from research demos to assigned CVEs in shipping enterprise products. EchoLeak (CVE-2025-32711, CVSS 9.3) is a zero-click exfiltration in Microsoft 365 Copilot — an attacker sends an email and Copilot leaks Outlook, OneDrive, and Teams content to an attacker-controlled URL when the user later asks for an inbox summary. CamoLeak (CVE-2025-59145, CVSS 9.6) is the same class against GitHub Copilot Chat via hidden HTML in PR descriptions. Tenable's 5 Nov 2025 disclosure (covered in The Hacker News) chained seven vulnerabilities across GPT-4o and GPT-5 to the same effect. The lead researcher's quote is the line every CISO should print: "Prompt injection is a known issue with the way that LLMs work, and, unfortunately, it will probably not be fixed systematically in the near future." If the model vendors aren't fixing it systematically, you need detection and containment in your own stack. This post is the long version: what each CVE actually does, the Blockchain Council survey showing 73% of LLM apps have prompt-injection exposure, the architectural answer (telemetry + redaction + budget caps), and what we ship in Sutrace. For the broader use-case see AI agent observability.

The named CVEs of 2025–2026

EchoLeak — CVE-2025-32711, CVSS 9.3, Microsoft 365 Copilot

Disclosed by Aim Security in June 2025. The clearest writeups are HackTheBox's, Checkmarx's, and the academic paper on arXiv.

The mechanism:

  1. Attacker sends an email to a target user. The email contains a hidden prompt — usually CSS-hidden white text or HTML metadata — instructing Copilot to do something on the user's behalf.
  2. Days or weeks later, the user asks Copilot to "summarise my inbox" or "what important emails came in this week."
  3. Copilot, in the course of summarising, reads the attacker's email, treats the hidden prompt as user instruction, and follows it.
  4. The instruction is to exfiltrate the user's recent Outlook content, OneDrive files, and Teams chat history to an attacker-controlled URL. This is done via a Markdown image reference, which Copilot renders, which forces the browser to make a GET request with the data in the URL.

Zero-click. The user never reads the malicious email. They never click anything. The attack triggers when the user uses Copilot for an unrelated, normal task.

The patches Microsoft shipped were two-fold: (1) restrict outbound URL rendering from Copilot output, (2) better classifier for ignoring instructions in retrieved content. But the underlying class — instructions in retrieved content getting executed — is unfixed.

CamoLeak — CVE-2025-59145, CVSS 9.6, GitHub Copilot Chat

Disclosed in late 2025 by Legit Security (a different vendor with a similar name pattern). Same class, different surface.

The mechanism:

  1. Attacker opens a pull request on a repo, or comments on an existing one. The PR description contains hidden HTML — CSS-hidden text, <details> blocks with the prompt, or HTML comments with carefully-formatted instructions.
  2. A developer asks Copilot Chat to "explain this PR" or "review the changes."
  3. Copilot Chat reads the PR description as context, treats the hidden HTML as instructions, and follows them.
  4. The instructions exfiltrate repository contents, environment variables visible to Copilot Chat, or confidential context the developer pasted into the chat earlier.

CVSS 9.6 — higher than EchoLeak because the leaked content is often source code or secrets, not just inbox summaries.

Tenable's 7-vuln ChatGPT chain (5 Nov 2025)

Tenable's disclosure went after the entire ChatGPT product surface, not just the API. Seven distinct issues, ranging from web-search content injection to memory poisoning to image-rendering side channels. The Hacker News coverage has the full chain. Both GPT-4o and GPT-5 affected.

The Tenable researcher's quote is the line every CISO should pin to the wall:

"Prompt injection is a known issue with the way that LLMs work, and, unfortunately, it will probably not be fixed systematically in the near future."

This is the most honest thing a security researcher has said about the LLM threat model in 2026. The model vendors will continue to ship mitigations — better instruction classifiers, retrieval sanitisation, output-rendering restrictions — but the underlying behaviour (LLMs treat instructions in any context as instructions) is a property of the architecture, not a bug to be fixed.

The 73% number

Two months before Tenable's disclosure, Blockchain Council published a survey-style review of production LLM applications. The headline: 73% of audited LLM apps had a prompt-injection vector in their context-loading path.

The number is plausible. The reason: most LLM apps load context from sources the developer doesn't fully control — search results, retrieved documents, tool outputs, user-uploaded PDFs, web scrapes. Each is an attacker-controllable channel into the prompt. Without instruction-content separation (which the model architecture does not enforce), every channel is a vector.

NeuralTrust's prompt-injection detection stack is a useful primer if you want to roll detection yourself.

What "shipping" actually means here

The three CVEs above all shipped in products that are running on hundreds of millions of seats. Microsoft 365 Copilot is in every M365 E5 SKU. GitHub Copilot Chat is in default GitHub. ChatGPT has 500M+ weekly users. The class of bug is not gated to research labs and university demos. It's in the production path of three of the biggest software products on earth.

The implication for your product: if Microsoft, GitHub, and OpenAI couldn't catch this in QA — with the largest red-teams and security budgets in the industry — you're not catching it either.

The defence is not "don't ship LLM features." The defence is assume the prompt can be hostile, and detect when it is.

What detection looks like

Prompt-injection detection is pattern recognition. It's not perfect — adversarial inputs can evade detectors — but it changes the asymmetry. Without detection, the attacker's payload is invisible until something catastrophic happens. With detection, you have a signal you can rate-limit, alert on, and route to human review.

The patterns that catch most real-world injection:

  1. Instruction-override phrasing. "Ignore previous instructions," "you are now…," "as a developer at OpenAI…" — these are classic and still common.
  2. Role confusion. Inputs that pretend to be system messages, or that try to close the user role and open a new system role.
  3. Hidden HTML/Markdown. White-on-white text, CSS-hidden divs, HTML comments containing instructions, <details> blocks with hidden content. EchoLeak and CamoLeak both used this.
  4. Encoded payloads. Base64-encoded instructions, ROT13, leetspeak. Cheap to detect, common to see.
  5. Jailbreak chains. Known patterns from the public jailbreak corpus — DAN, AIM, Maximum, etc.
  6. Context-boundary attacks. Attempts to exfiltrate the system prompt, often via "what are your instructions" or "repeat the words above."

A detector running on every prompt produces a score per vector. Spans get tagged. Alerts route on threshold. Your CISO sees a dashboard, not an invoice from an exfiltrated database.

The architectural answer

There is no single fix. The honest answer is layered defence with three architectural primitives:

1. On-host prompt redaction

Strip PII, secrets, and customer-defined patterns from the prompt before it leaves your network. The redactor runs in your VPC. Even if a prompt-injection succeeds and the model is instructed to exfiltrate, what gets exfiltrated is the redacted version — pattern-matched placeholders, not the original data.

This is a containment primitive, not a prevention primitive. It assumes the model can be convinced to leak; it ensures what's leakable is minimal.

2. Prompt-injection detection telemetry

Every span carries an injection-detection score. You don't have to act on it programmatically — but you have to see it. The pattern that works in production: alert on score above threshold, rate-limit the source, route the request through a more-restrictive model variant or to a human reviewer.

3. Hard budget caps

Many real-world injection payloads don't exfiltrate — they hijack the agent into doing expensive computation on the attacker's behalf. A cap that fires synchronously bounds the damage. See Hard budget caps for AI agents.

Where the model vendors fit

Microsoft, GitHub, and OpenAI will keep shipping mitigations. What you should expect from each:

  • Better instruction classifiers — model-side fine-tuning to recognise and ignore instructions inside retrieved content. Helps on average; defeated by novel phrasing.
  • Output sanitisation — restrict URL rendering, link types, image types in model output. The EchoLeak fix was largely this.
  • Sandboxing of tool calls — restrict which tools the model can invoke when context contains untrusted content. This is the most architecturally promising direction.

None of these defences are complete. The Tenable researcher is right. You need defence in your own stack.

What Sutrace ships

We do three things relevant to this thread:

(a) Prompt-injection detection as a span attribute. Every Sutrace-instrumented LLM call gets a sutrace.injection.score in the span attributes. Six detector classes (instruction-override, role-confusion, hidden HTML/Markdown, encoded payload, jailbreak-chain, context-boundary). Alert rules can be set on the score. The detection runs on-host or in the SDK middleware, not in our cloud — your prompts don't have to be exported to be scored.

(b) On-host PII redaction. The redactor runs in your VPC and strips PII, secrets, and custom regex patterns before the OTel span is exported. Your dashboard sees the redacted version. Even if an injection succeeds, the model never had access to the original PII.

(c) Hard budget caps. Synchronous, in front of every provider call. An injection that tries to hijack your agent into expensive computation hits the cap first. See Hard budget caps for AI agents.

These three together don't make injection impossible. They make exploitation expensive and detectable, which is the realistic security posture for 2026.

For the broader picture see the AI agent observability use case and the 4-way comparison of LLM observability tools — none of the four (LangSmith, Helicone, Langfuse, Phoenix) ship injection detection as a default span attribute. The category needs to catch up.

Citations and further reading

A note on what NOT to ship

Three architecture choices that look like security and aren't:

1. "We added an LLM filter that rejects malicious prompts." Filter LLMs are themselves vulnerable to prompt injection. The filter prompt can be overridden by content in the input. You've added a layer of latency without changing the threat model.

2. "Our system prompt tells the model to ignore instructions in retrieved content." This is the default mitigation that vendors ship and it does not work reliably. Adversarial content rephrases instructions in ways that don't pattern-match the system prompt's warnings. Necessary, not sufficient.

3. "We red-team our prompts during QA." Important. But your QA red-team can't anticipate the inputs that will exist in production three months from now. Continuous detection in production telemetry is the only durable answer.

The honest framing for 2026: you cannot prevent prompt injection from succeeding occasionally. You can make sure that when it succeeds, the blast radius is small (containment) and the security team knows about it within minutes (detection). That's the realistic goal. Anything more is theatre.

If you're shipping agents in 2026 and the only thing standing between an injected prompt and your customer data is the model vendor's classifier — fix that this quarter. The CVEs are not slowing down.