Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessThe UK government reportedly wants Anthropic to expand its presence in LondonEngadget"Open the Fuckin' Strait": Trump threatens to start bombing civilian infrastructure TuesdayAxios TechCursor s $2 billion bet: The IDE is now a fallback, not the defaultThe New StackThe mega IPOs of SpaceX, Anthropic and OpenAI alone can't fix this stock market - CNBCGoogle News: OpenAIAI Expert Says It’s Time to Stop Freaking Out About AI Taking Our JobsFuturism AIAnthropic Cracks Down On Unauthorized Claude Usage By Third Party Harnesses And Rivals Click Through The Up Coming Post (86PIxNR3De) - MshaleGoogle News: ClaudeProxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and CostTowards Data ScienceAnthropic: You Can’t Use OpenClaw With Claude Without Paying Extra - PCMagGoogle News: ClaudeGemma 4 and Gemini: Two Paths Shaping Google’s AI Strategy - Morocco World NewsGoogle News: GeminiAnthropic: You Can’t Use OpenClaw With Claude Without Paying Extra - PCMag Middle EastGoogle News: ClaudeWhat is the effect on the Human mind from AI?discuss.huggingface.coUnderstanding Token Classification in NLP: NER, POS Tagging & Chunking ExplainedMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessThe UK government reportedly wants Anthropic to expand its presence in LondonEngadget"Open the Fuckin' Strait": Trump threatens to start bombing civilian infrastructure TuesdayAxios TechCursor s $2 billion bet: The IDE is now a fallback, not the defaultThe New StackThe mega IPOs of SpaceX, Anthropic and OpenAI alone can't fix this stock market - CNBCGoogle News: OpenAIAI Expert Says It’s Time to Stop Freaking Out About AI Taking Our JobsFuturism AIAnthropic Cracks Down On Unauthorized Claude Usage By Third Party Harnesses And Rivals Click Through The Up Coming Post (86PIxNR3De) - MshaleGoogle News: ClaudeProxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and CostTowards Data ScienceAnthropic: You Can’t Use OpenClaw With Claude Without Paying Extra - PCMagGoogle News: ClaudeGemma 4 and Gemini: Two Paths Shaping Google’s AI Strategy - Morocco World NewsGoogle News: GeminiAnthropic: You Can’t Use OpenClaw With Claude Without Paying Extra - PCMag Middle EastGoogle News: ClaudeWhat is the effect on the Human mind from AI?discuss.huggingface.coUnderstanding Token Classification in NLP: NER, POS Tagging & Chunking ExplainedMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Vulnerability Research Is Cooked

Simon Willison Blogby Simon WillisonApril 3, 20262 min read2 views
Source Quiz

Vulnerability Research Is Cooked Thomas Ptacek's take on the sudden and enormous impact the latest frontier models are having on the field of vulnerability research. Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won’t be a slow burn, but rather a step function. Substantial amounts of high-impact vulnerability research (maybe even most of it) will happen simply by pointing an agent at a source tree and typing “find me zero days”. Why are agents so good at this? A combination of baked-in knowledge, pattern matching ability and brute force: You can't design a better problem for an LLM agent than exploitation research. Before you feed it a single token of context, a frontier LLM already en

3rd April 2026 - Link Blog

Vulnerability Research Is Cooked. Thomas Ptacek's take on the sudden and enormous impact the latest frontier models are having on the field of vulnerability research.

Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won’t be a slow burn, but rather a step function. Substantial amounts of high-impact vulnerability research (maybe even most of it) will happen simply by pointing an agent at a source tree and typing “find me zero days”.

Why are agents so good at this? A combination of baked-in knowledge, pattern matching ability and brute force:

You can't design a better problem for an LLM agent than exploitation research.

Before you feed it a single token of context, a frontier LLM already encodes supernatural amounts of correlation across vast bodies of source code. Is the Linux KVM hypervisor connected to the hrtimer subsystem, workqueue, or perf_event? The model knows.

Also baked into those model weights: the complete library of documented "bug classes" on which all exploit development builds: stale pointers, integer mishandling, type confusion, allocator grooming, and all the known ways of promoting a wild write to a controlled 64-bit read/write in Firefox.

Vulnerabilities are found by pattern-matching bug classes and constraint-solving for reachability and exploitability. Precisely the implicit search problems that LLMs are most gifted at solving. Exploit outcomes are straightforwardly testable success/failure trials. An agent never gets bored and will search forever if you tell it to.

The article was partly inspired by this episode of the Security Cryptography Whatever podcast, where David Adrian, Deirdre Connolly, and Thomas interviewed Anthropic's Nicholas Carlini for 1 hour 16 minutes.

I just started a new tag here for ai-security-research - it's up to 11 posts already.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Vulnerabili…modelagentresearchSimon Willi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 151 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!