Every security team I talk to is asking the wrong question. They want to know which AI model finds the most vulnerabilities — Mythos Preview vs. GPT-6 vs. their own fine-tuned detector. They compare CVSS scores per hour, benchmark false-positive rates, and argue over whether the next generation of cyber-capable models will put defenders ahead or behind.
These debates are already obsolete. The discovery problem has been solved.
Anthropic’s Project Glasswing partners have collectively found more than 10,000 high- and critical-severity vulnerabilities in the world’s most important software since April. Cloudflare alone identified 2,000 bugs across its critical-path systems in weeks — 400 of them high- or critical-severity — with a false-positive rate the team considers better than human testers. Mozilla found 271 vulnerabilities in a single Firefox release, over ten times what they caught with the previous generation of AI tooling. The UK AI Security Institute reports that Mythos Preview is the first model to solve both of its cyber ranges end to end.
The bottleneck is no longer “can we find the hole.” It’s “can we fix it before someone exploits it.”
Anthropic is now expanding Project Glasswing to 150 new organizations across 15 countries — power grids, water systems, healthcare networks, communications infrastructure, hardware vendors. But buried in the expansion announcement is a far more important signal than the partner count: Anthropic explicitly states that the company’s role is “to steadily shift the support we provide, from finding vulnerabilities to disclosing, fixing, and deploying patched software.”
They recognize that detection has become the easy part. The hard, defensible, value-accruing work is the remediation pipeline.
The numbers that matter
The raw data from the Glasswing initial update tells a stark story. Mythos Preview found an estimated 6,202 high- or critical-severity vulnerabilities across 1,000+ open-source projects. After triage by independent security firms, 90.6% proved to be valid true positives. That means nearly 3,900 confirmed critical bugs in open-source code alone — on top of what the partners found in their proprietary systems.
And yet only 75 of the 530 high- or critical-severity bugs Anthropic has reported to maintainers have been patched. That’s a 14% patch rate.
Anthropic is careful to note that many of these are still within the 90-day coordinated disclosure window, and that some patches land without public advisories, so the true number is likely higher. But the company’s own language is revealing: “the low volume of patches reflects a genuine problem.” Even at a deliberately measured disclosure pace, “Mythos Preview is adding to an already-overloaded security ecosystem.”
The math is straightforward. If a single model from a single company can surface thousands of exploitable vulnerabilities in weeks, and each takes an average of two weeks to patch, the industry faces a compound deficit. The gap between discovery velocity and remediation capacity is widening with every model release — and Anthropic warns that within 6 to 12 months, “many other AI companies will have Mythos-class models” that may lack safeguards against misuse.
Why detection is being commoditized
The conventional wisdom in cybersecurity has long been that detection is the moat. The org with the best signal-to-noise ratio, the most advanced SIEM, the most skilled threat hunters — that org wins. AI inverts this assumption.
Frontier models are general-purpose reasoning engines, not specialized detectors. Their ability to find vulnerabilities comes from the same underlying capacity that lets them write code, analyze legal documents, and solve math problems. Cyber capability is a side effect of general intelligence, not a distinct product. As frontier models converge in capability — and they are converging — the detection advantage between them narrows to noise.
This is already visible in the data. XBOW, an independent security platform, reports that Mythos Preview is a “significant step up over all existing models” on its web exploit benchmark, but the company’s real emphasis is on “absolutely unprecedented precision” at the token level — a marginal efficiency gain, not a step-change in what’s possible. The frontier is compressing.
What cannot be compressed is the human-in-the-loop work of remediation: triaging findings, writing patches, testing them in staging, navigating organizational change management, deploying to production, and monitoring for regressions. These are workflows embedded in organizational processes, not model capabilities. They are the bottleneck because they are not AI-solvable in isolation — they require coordination, context, and trust.
What remediation infrastructure looks like
Anthropic is not leaving this gap unaddressed. The launch of Claude Security — which has already patched over 2,100 vulnerabilities in three weeks using Claude Opus 4.7 — is a direct bet on remediation as the product. So is the release of the tools Anthropic built for Glasswing partners: a codebase mapping harness, a subagent-oriented scanning framework, a threat model builder, and a set of deployable skills.
The pattern here matters. Anthropic could have chosen to keep these tools exclusive to the 150 partners. Instead, they’re making them available on request to qualifying security teams. This is a deliberate strategy to seed the remediation infrastructure market — because Anthropic understands that the value in AI cyber defense accrues to whoever owns the deploy loop, not the detection layer.
Palo Alto Networks’ latest release included over five times as many patches as usual. Microsoft reports that Patch Tuesday counts will “continue trending larger for some time.” Oracle is finding and fixing vulnerabilities “multiple times faster” than before. These numbers are typically read as evidence of an escalating threat landscape. But they are better read as evidence of a market in transition — from the detection era to the remediation era.
The practitioner takeaway
For enterprise security teams building with agentic AI, the strategic implication is sharp. Every dollar spent on another detection tool, another SIEM connector, another vulnerability scanner that promises better signal is a dollar that is increasingly fungible. The models are all getting good enough. The differentiation has moved downstream.
The investments that compound are in remediation infrastructure:
- Patch validation pipelines that can test AI-generated fixes in staging environments with automated rollback. Claude Security generates patch suggestions; the bottleneck is verifying they don’t break production.
- Disclosure workflows that can triage findings and route them to the right maintainers or teams with context, not firehoses. The open-source maintainers who asked Anthropic to slow down its disclosures were not being obstructionist — they were drowning.
- Deployment automation that can push verified patches across heterogeneous environments faster than the adversary can weaponize the disclosed vulnerability. Every day between disclosure and deploy is a window for exploit.
- Rollback infrastructure for when remediation introduces its own failures. The safety of the system depends on the ability to undo changes at speed.
The org that builds these capabilities will capture the value that detection tooling is rapidly losing. The org that doesn’t will find itself on the wrong side of the widening gap between what models can find and what humans can fix.
Where this is heading
Anthropic’s language about Mythos-class general availability is cautious — “once we’ve developed the far stronger safeguards we need” — but the trajectory is clear. Within a year, models capable of finding and exploiting the vulnerabilities Mythos Preview finds will be broadly accessible. When that happens, the asymmetry between discovery and remediation becomes an active threat: attackers can find and exploit faster than defenders can patch.
The only durable response is infrastructure that compresses the remediation timeline from weeks to hours. That means automated patch generation is table stakes. The real prize is automated patch verification, staged deployment, canary testing, and instant rollback — a closed-loop remediation system that operates at machine speed
Project Glasswing’s expansion to 150 organizations is not the story. The story is that Anthropic, the company that built the model that found 10,000 vulnerabilities, is telling us the bottleneck has moved. Every security team that ignores this signal is making the same mistake the industry made when it treated detection as the moat — investing in what is being commoditized instead of infrastructure that compounds.
The patch gap is the new attack surface. Build for it.
No comments yet