The Receipts Are In

Walking through Anthropic's Project Glasswing Update

May 25, 2026

For almost the past year, I’ve argued that the economics of vulnerability discovery are collapsing toward zero cost while remediation stays human-bound, that the NVD and the open-source maintainer ecosystem can’t absorb the flood, and that AI cyber capability is doubling on a months-long clock. Anthropic’s Project Glasswing initial update, published May 22, is the first large-scale empirical confirmation of all three.

Roughly 50 partners running Anthropic’s Mythos Preview model found over 10,000 high-and-critical-severity vulnerabilities in approximately one month. Several reported that their rate of bug finding increased by more than a factor of ten, and Anthropic’s own framing states the thesis almost verbatim, that “progress on software security used to be limited by how quickly we could find new vulnerabilities” and “is now limited by how quickly we can verify, disclose, and patch” them. In short, remediation is (and has been) the bottleneck, and now driven by AI is more problematic than ever.

This is not a new argument from me. It’s a collection on predictions I already made, the data just got a lot harder to argue with.

What Glasswing Actually Reported

Glasswing is Anthropic’s coordinated effort to apply its Mythos-class vulnerability research model across both commercial partners and open-source projects.

The update covers roughly one month of activity and includes results from partners like Cloudflare, Mozilla, Palo Alto Networks, Microsoft, and Oracle, alongside an independent open-source scanning effort.

The partner-side numbers are striking. Cloudflare found 2,000 bugs, 400 of which were high or critical severity, with a false-positive rate that Cloudflare’s own team considers better than human testers. Mozilla found and fixed 271 vulnerabilities in Firefox 150, over ten times more than they found in Firefox 148 using Claude Opus 4.6. Palo Alto Networks shipped over five times as many patches as usual. Microsoft reported that patch volumes will continue trending larger for some time.

On the open-source side, the model identified 23,019 total vulnerabilities across more than 1,000 projects, with 6,202 estimated as high or critical severity. Of the 1,752 that independent security firms assessed, 90.6% were confirmed as valid true positives and 62.4% confirmed as genuinely high or critical. That puts the project on track to surface approximately 3,900 confirmed high-and-critical vulnerabilities in open-source software from this single scan.

The UK AI Safety Institute independently validated that Mythos Preview is the first model to solve both of their cybersecurity evaluation ranges end to end.

One partner bank reported that Mythos Preview helped detect and prevent a fraudulent $1.5 million wire transfer, and the wolfSSL certificate-forgery vulnerability the model discovered, assigned CVE-2026-5194, demonstrated that Mythos can not only find bugs but construct working exploits against real cryptographic libraries.

Validating Expectations

I want to walk through the specific prior arguments that Glasswing’s data confirms, because the value here isn’t in the novelty of the findings, it’s in the structural pattern they validate.

The Flood

In Vulnpocalypse, I laid out the case that AI-accelerated vulnerability discovery would overwhelm every downstream system the industry depends on for triage, enrichment, and remediation. The argument was structural, not speculative due to the widespread systemic impacts of AI-driven vulnerability industrialization.

If the cost of finding vulnerabilities drops to near zero while the cost of fixing them stays constant, the backlog doesn’t grow linearly, it compounds.

Glasswing’s 10,000-in-a-month result against commercial targets, combined with the 6,200 high-and-critical findings in open source, is exactly the flood I described. Nicolas Carlini’s research estimating that AI vulnerability research capability doubles roughly every four months makes these numbers a floor, not a ceiling, especially as model improvements continue to compound with each release.

The Capability Curve

In The AI Cyber Capability Curve, I argued that offensive AI capability was tracking an exponential improvement curve and that defenders needed to plan for capability levels that didn’t exist yet but were months away from arriving.

The UK AISI’s confirmation that Mythos Preview is the first model to solve both of their cyber ranges end to end, followed by GPT-5.5 shortly after is exactly the kind of step-function capability jump that curve predicts. These evaluation ranges were designed to be hard. A model cleared them entirely, and the next generation will be better still.

The NVD and Maintainer Collapse

In The NVD Just Threw in the Towel, I documented how NIST reclassified approximately 29,000 backlogged CVEs to “Not Scheduled,” effectively conceding that the system can’t keep pace with the current volume of incoming vulnerabilities, let alone an AI-accelerated one.

Glasswing’s open-source results now put concrete pressure on that already-broken system. The project disclosed 530 high-and-critical bugs to open-source maintainers in roughly a month, and maintainers responded exactly the way you’d expect an already-overloaded system to respond. Some asked Anthropic to slow down.

That last detail is the most telling data point in the entire update. Maintainers aren’t asking for better vulnerability reports. They’re asking for fewer of them.

The system’s constraint isn’t information (findings), it’s human capacity to act on information (remediation).

The Bottleneck Moved and the Crisis Didn’t

Glasswing’s patch-side data confirms the remediation crisis I’ve been tracking across multiple pieces. The average time to patch a high-or-critical bug disclosed through the project was two weeks. Only 75 of the 530 disclosed high-and-critical vulnerabilities have been patched so far, with just 65 receiving public advisories. That means less than 15% of the high-and-critical vulnerabilities Glasswing’s efforts have help expose are remediated.

This is relevant because it highlights the remediation bottleneck and also due to the fact that others, including malicious actors, are looking with alternative models and most certainly finding vulnerabilities.

Those numbers need context. The industry was already running a remediation deficit before AI-accelerated discovery entered the picture. FIRST projected a median of approximately 59,000 new CVEs for 2025, a roughly 50% increase over the prior year.

The 2026 DBIR showed that only 26% of critical KEV vulnerabilities were fully remediated, with a 43-day median to resolution, and the exploitation timeline has collapsed in the opposite direction. Sergej Epp’s Zero Day Clock research tracked the median time-to-exploit falling from 771 days in 2018 to roughly four hours by 2024.

The math is straightforward and unfavorable. Defenders measure their remediation timelines in weeks and months, attackers measure their exploitation timelines in hours.

Now the rate of vulnerability discovery has jumped by a factor of ten or more for organizations using Mythos-class models or even open source and alternative models coupled with effective harness engineering. The backlog doesn’t just grow, it grows faster than any organization’s ability to process it, which means the effective window of exploitability for every discovered vulnerability is expanding rather than shrinking.

Anthropic’s own Claude Security illustrates the asymmetry. Enterprise users patched 2,100 vulnerabilities in three weeks using the tool, a pace that looks fast only because enterprises fix their own code and skip the coordinated disclosure process entirely.

For open-source maintainers who don’t control the deployment environment and who rely on downstream consumers to actually apply the patch, the timeline stretches dramatically, and the 75-of-530 number is the honest measure of where the ecosystem actually stands.

The Skeptic’s Read

I’ve been critical of AI hype cycles in security throughout this newsletter and blog, including in Security’s AI-Driven Dilemma, and I don’t intend to turn off that filter because the data happens to be compelling. There are several things practitioners should hold with appropriate skepticism.

The “too dangerous to release” framing around Mythos serves dual purposes. It is genuinely responsible to withhold a model that can find thousands of critical vulnerabilities per month from general availability, that said it is also a competitive moat.

Anthropic has first-mover advantage with a capability that it openly states will be available from multiple labs soon, and the window during which it’s the only organization with that capability is a strategic asset, the same applies to security vendors who have been part of the gated release. Both things can be true simultaneously, and the responsible thing for practitioners is to evaluate the data on its merits while recognizing the incentive structure behind the framing.

The 90.6% true-positive rate deserves scrutiny in terms of generalizability. That number comes from a triaged sample of 1,752 vulnerabilities assessed by independent security firms.

Whether that rate holds across the full 23,019 findings, or across different codebases and vulnerability classes, is a question the update doesn’t fully answer. A false-positive rate better than human testers, as Cloudflare reported, is meaningful, but the comparison benchmark matters. Human testers vary enormously in quality, and “better than human testers” at one organization may mean something very different at another.

The enterprise patching speed also needs honest framing. Two thousand vulnerabilities patched in three weeks sounds fast until you recognize that enterprise teams patching their own proprietary code have fundamentally different dynamics than open-source maintainers who are often unpaid, understaffed, and managing projects in their spare time. The speed difference isn’t a technology gap, it’s a resource and incentive gap, and AI-accelerated discovery doesn’t change the resource equation on the maintainer side at all.

And the most uncomfortable admission in the update is that even a throttled, carefully managed disclosure pace is adding load to an already-overloaded ecosystem. If Anthropic is disclosing responsibly and maintainers are still asking for a slowdown, the implication is clear.

The system as currently designed cannot absorb AI-scale vulnerability discovery regardless of how carefully you manage the pipeline.

Coordinated disclosure norms were built for a world where finding vulnerabilities was hard and slow, and that world doesn’t exist anymore.

What Defenders Do Now

The structural reforms I’ve argued for in prior pieces aren’t theoretical anymore, Glasswing puts empirical weight behind each of them.

Patch cycles need to compress dramatically. A two-week median for a high-severity fix was acceptable when the discovery rate was measured in dozens per quarter. At thousands per month, two weeks is too slow for the most critical findings and completely unworkable for the long tail. Organizations that can’t move to continuous deployment for security patches will be running an ever-expanding window of exploitable exposure.

Reachability-based prioritization becomes non-negotiable. When the volume of incoming vulnerabilities jumps by 10x, the only viable triage strategy is to focus on what’s actually reachable and exploitable in your specific environment rather than treating every high-CVSS finding as equally urgent.

As I wrote in Vulnerability Management in the Age of AI-Accelerated Everything, Endor Labs’ research found that 92% of critical open-source vulnerabilities flagged by traditional scanners aren’t actually reachable in the application context. That 92% noise figure becomes the difference between a manageable workload and an impossible one when the denominator increases by an order of magnitude.

Memory-safe language mandates gain urgency with every one of these reports. A significant portion of the vulnerabilities AI models find are memory safety issues in C and C++ codebases, the same vulnerability class that’s been generating CVEs for decades. The structural fix is to stop writing new code in memory-unsafe languages, a position leaders such as Jen Easterly and Bob Lord advocated for during their tenure at CISA, and one that Glasswing’s data makes even harder to argue against.

Others, such as longtime security practitioner and leader Neils Provos have penned great articles such as “The Day After the Zero Days” pointing out that “patch faster” cannot keep up with AI-driven discovery, and instead argues for “structural invariants” to make bug classes irrelevant.

Niels states:

”The response is not to find bugs faster. It is to build infrastructure that takes attack classes off the critical path of ongoing human security decisions.”

The broader incentive question remains untouched as well. As Bruce Schneier has argued repeatedly, no industry in history has improved its safety practices without being forced to through regulation or liability.

The software industry’s current incentive structure rewards shipping fast over shipping secure, and nothing about AI-accelerated vulnerability discovery changes that calculus for the vendors producing the vulnerable software in the first place. If anything, it widens the gap between the organizations that can afford to respond and the ones that can’t.

That said, we’re currently in a deregulatory environment with the current U.S. presidential administration, and as others such as Jim Dempsey have pointed out, the topic of software liability is often considered a “third rail” of cybersecurity policy, meaning, no one typically wants to touch it.

Hence, many have considered cybersecurity a market failure and that’s unlikely to change in the current landscape.

The Floor, Not the Ceiling

The pattern I laid out in The AI Cyber Capability Curve predicts that every one of these numbers will look quaint within 12 months.

If AI vulnerability research capability is doubling every four months, the 10,000-in-a-month figure from this update represents roughly two doublings ago by the time the next Glasswing update ships. Mythos-class models will be available from multiple labs and even among the open source community, as Anthropic itself acknowledges, which means the discovery rate will multiply across the industry, not just within one company’s partner program.

There’s no question whether AI will continue to find vulnerabilities faster than humans can fix them, as that’s settled and no longer debatable.

The question is whether the institutions, incentive structures, and economic models that govern software security can adapt fast enough to absorb what’s coming, or whether the industry will learn the same lesson every other safety-critical domain has already learned, that the forcing function for real structural change is sufficient consequences, not foresight.

Resilient Cyber

Discussion about this post

Ready for more?