The Great Patching Bottleneck: When Discovery Outruns Remediation

For decades, the binding constraint in software security was a knowledge problem: you could not patch what you could not find. The entire vulnerability management industry — coordinated disclosure timelines, patch Tuesday rhythms, bug bounty programs — was built on the implicit assumption that discovery was the scarce resource. That assumption has now been empirically falsified.

In a single month, approximately 50 partners in Anthropic's Project Glasswing collectively found more than 10,000 high- or critical-severity vulnerabilities across the world's most systemically important software. The discovery rate has leapt by an order of magnitude.

The new bottleneck is not finding bugs. It is fixing them.

What the Data Shows

The evidence comes from multiple independent sources, each measuring a different vector of the same shift. They triangulate on a consistent finding: models at the Mythos Preview capability level are no longer experimental curiosities — they are production tools reshaping the security landscape.

Cloudflare pointed Mythos Preview at more than 50 of its own repositories and found 2,000 bugs, of which 400 were rated high- or critical-severity. The company's chief security officer reports a false positive rate their team considers better than human testers — a striking methodological detail, given that hallucinated findings have historically been the Achilles' heel of LLM-based vulnerability detection. Cloudflare published the architecture of their harness — an eight-stage pipeline (Recon, Hunt, Validate, Gapfill, Dedupe, Trace, Feedback, Report) — that deliberately splits the task into narrow, specialized subproblems within a sequential pipeline rather than asking one agent to be exhaustive.

Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview. That is more than ten times the 22 bugs they found in Firefox 148 with Claude Opus 4.6. The Firefox security team noted that they have "found no category or complexity of vulnerability that humans can find that this model can't" — a claim that, if it holds, implies the asymptotic bound on machine-discoverable bugs approaches the bound on human-discoverable ones.

XBOW, an independent security platform, subjected Mythos Preview to its internal web exploit benchmark. The results show a 42% reduction in false negatives compared to Opus 4.6, and a 55% reduction when source code was available. On a token-for-token basis, XBOW reports "absolutely unprecedented precision."

XBOW also published a cost-normalized analysis showing that Mythos Preview is not best-in-class when measured by cost per finding. An agent using GPT-5.5 with a larger token budget can match or exceed Mythos Preview's hit rate for less money, depending on the task. The caveat matters: raw capability does not imply cost-effective deployability.

The UK AI Security Institute reports that Mythos Preview is the first model to solve both of its cyber ranges end to end. Two new academic benchmarks — ExploitBench and ExploitGym — both show Mythos Preview as the strongest performer on exploit development tasks — an independent, reproducible measurement layer the field has lacked.

Palo Alto Networks shipped 26 CVEs (75 issues) in its May 2026 advisory, compared to its typical volume of fewer than 5 CVEs. That is a 5x+ increase, and the first time the majority of Palo Alto's findings originated from frontier AI models. Their security team estimates a 3-to-5-month window before AI-driven exploits become the new norm.

Microsoft reported that Patch Tuesday volumes will "continue trending larger for some time," noting that a greater share of issues in the latest release were discovered internally rather than by external researchers — many surfaced through AI-assisted scanning. Oracle is finding and fixing vulnerabilities "multiple times faster" than before — Anthropic's Glasswing report notes the company is now operating at multiple times faster cycle times — and introduced a monthly Critical Security Patch Update cadence alongside the existing quarterly cycle.

WolfSSL, a cryptography library used by billions of devices, had a certificate-forgery vulnerability found and autonomously exploited by Mythos Preview — now CVE-2026-5194, patched in wolfSSL 5.9.1. The library's maintainers, who already run extensive AI-powered static analysis, noted that "none of them found what Mythos did."

The Open-Source Crunch

Anthropic itself has scanned over 1,000 open-source projects with Mythos Preview, producing an estimated 6,202 high- or critical-severity vulnerability candidates. Of the 1,752 that have been independently triaged by six security research firms, 90.6% (1,587) proved to be valid true positives, and 62.4% (1,094) were confirmed as high- or critical-severity.

Yet only 75 of the 530 high- or critical-severity bugs reported to maintainers have been patched so far. The gap is not due to model error. It reflects the human bottleneck: on average, a high- or critical-severity bug found by Mythos Preview takes two weeks to patch. Several open-source maintainers have explicitly asked Anthropic to slow the disclosure rate because they lack the capacity to design patches.

This is the core finding. The discovery-to-patch ratio has structurally inverted. At current rates, Mythos Preview alone could surface nearly 3,900 confirmed high- or critical-severity vulnerabilities in open-source code — and that is before counting what it finds for Project Glasswing partners.

"The relative ease of finding vulnerabilities compared with the difficulty of fixing them amounts to a major challenge for cybersecurity."

— Anthropic, Project Glasswing Initial Update

The Counter-Narrative: Defenders Are Not Doomed

The dominant reaction to these numbers has been alarm — and not without reason. A world where attackers can find zero-days in minutes seems catastrophic. But the evidence supports a more nuanced reading.

The models currently finding vulnerabilities are, for the most part, in the hands of defenders. Anthropic has not released Mythos Preview publicly. The access model — through Project Glasswing's curated partnerships and Microsoft Foundry's gated research preview — is deliberately calibrated for defensive use. The question is how long this asymmetry lasts.

The limiting factor for defenders is operational, not epistemic. We now know where the bugs are. The problem is organizational: compressing patch-to-deployment timelines, automating triage, and hardening the fundamentals that operate independently of the patch cycle. Cloudflare's harness architecture — an eight-stage sequential pipeline with parallel sub-agents within stages — demonstrates a repeatable pattern for scaling vulnerability discovery without drowning in noise.

XBOW's cost-normalized analysis shows the economics of vulnerability discovery are more complex than a simple "Mythos is magic" narrative. The right question is not "which model finds the most bugs?" but "what is the optimal allocation of compute budget across models for a given codebase and threat model?" That question is still largely unanswered.

What the Data Tells Us About What to Do

The recommendations converging across Anthropic, Microsoft, Palo Alto Networks, and Cloudflare share a common structure. Each is a response to the same underlying measurement: the discovery-patch gap.

Compress patch cycles. The 90-day CVD window was calibrated for a world where discovery was slow. In a world where AI finds bugs in minutes, every unpatched day is an exploit window. Oracle's shift to monthly Critical Security Patch Updates and Palo Alto's accelerated release cadence are templates.
Automate triage with adversarial validation. Cloudflare's harness demonstrates the most important architectural insight: adding a separate validation agent — one with no ability to generate its own findings, tasked specifically with disproving the original agent's results — dramatically reduces noise. This is a replicable pattern, not proprietary magic.
Invest in defense-in-depth. As Microsoft MSRC's Tom Gallagher put it: "The fundamentals have not changed. The pace at which they need to be applied is changing." Multi-factor authentication, network segmentation, least-privilege access, and comprehensive logging operate independently of the patch cycle.
Prepare for Mythos-class models to proliferate. The trajectory of GPT-5.5, which XBOW reports achieves "Mythos-like" performance on its benchmarks, suggests that these capabilities are not locked behind a single unreleased model. Organizations that invest now in patching infrastructure and triage workflows will have a structural advantage.

The Bottom Line

The security bottleneck has flipped. The data from Cloudflare, Mozilla, XBOW, Microsoft, Oracle, Palo Alto Networks, and Anthropic's own open-source scanning all points in the same direction: discovery capability has outpaced remediation capacity by a factor of ten or more. The transition period — while vulnerabilities are rapidly found and slowly patched — is genuinely dangerous.

The data supports a cautiously optimistic reading. The models can find bugs. The harnesses can triage findings. The patch infrastructure can be accelerated.

"Defenders finally have a chance to win, decisively."

— Mozilla Security Team

The data says the old ways no longer suffice. The evidence is in.

The Great Patching Bottleneck: When Discovery Outruns Remediation

What the Data Shows

The Open-Source Crunch

The Counter-Narrative: Defenders Are Not Doomed

What the Data Tells Us About What to Do

The Bottom Line

Further Reading

No comments yet

Continue reading

The Integration Ceiling

The Sandbox War: Cloudflare and Vercel Both Solved the Same Infrastructure Blind Spot

File-Based Planning Is Becoming the Universal Agent Protocol

Track the tools. Lead the shift.