How I Hunt Security Bounties with Claude Code (Real Workflow, Real Payouts)
How I Hunt Security Bounties with Claude Code (Real Workflow, Real Payouts) Most security bounty guides tell you to "learn OWASP Top 10" and "practice on HackTheBox." That's fine for learning. But if you want to actually earn money from bounties, you need a workflow that scales — because manually reading source code doesn't. I've been using Claude Code with custom skills to scan open-source repositories for vulnerabilities. Here's my actual workflow, with real examples from repos I've scanned. The Problem with Manual Code Auditing Open-source security bounties are a goldmine if you know where to look. Projects like Anthropic's MCP servers, Microsoft's TypeAgent, and dozens of mid-tier repos on GitHub all have bounty programs or responsible disclosure policies. But manually auditing a 50,00
How I Hunt Security Bounties with Claude Code (Real Workflow, Real Payouts)
Most security bounty guides tell you to "learn OWASP Top 10" and "practice on HackTheBox." That's fine for learning. But if you want to actually earn money from bounties, you need a workflow that scales — because manually reading source code doesn't.
I've been using Claude Code with custom skills to scan open-source repositories for vulnerabilities. Here's my actual workflow, with real examples from repos I've scanned.
The Problem with Manual Code Auditing
Open-source security bounties are a goldmine if you know where to look. Projects like Anthropic's MCP servers, Microsoft's TypeAgent, and dozens of mid-tier repos on GitHub all have bounty programs or responsible disclosure policies.
But manually auditing a 50,000-line codebase? That's a week of your life for maybe one finding. I needed something faster.
My Workflow: Claude Code + Semgrep + Custom Skills
Here's the stack I use:
-
Claude Code as the orchestration layer
-
Semgrep for pattern-based static analysis
-
A custom security scanner skill that chains these together
The key insight: Claude Code skills let you encode repeatable security analysis workflows into reusable commands. Instead of remembering which Semgrep rules to run and how to interpret the results, you build it once and invoke it with /scan.
Step 1: Clone and Scope
git clone https://github.com/target/repo cd repogit clone https://github.com/target/repo cd repoQuick file count and language breakdown
find . -name ".ts" -o -name ".py" -o -name ".js" | wc -l`
Enter fullscreen mode
Exit fullscreen mode
Before scanning, I scope the attack surface. For MCP servers, the interesting files are:
-
Request handlers (where user input enters)
-
Tool definitions (where commands get executed)
-
Authentication middleware (where trust decisions happen)
Step 2: Run Semgrep with Custom Rules
Generic Semgrep rulesets catch the obvious stuff. The real value is custom rules tuned to the framework you're auditing.
For MCP servers, I wrote rules that catch:
rules:
- id: mcp-unvalidated-tool-input patterns:
- pattern: | $HANDLER($REQUEST, ...) { ... $CMD = $REQUEST.params.$FIELD ... }
- pattern-not: | $HANDLER($REQUEST, ...) { ... validate($REQUEST.params.$FIELD) ... } message: "Tool input used without validation" severity: WARNING`
Enter fullscreen mode
Exit fullscreen mode
This catches the pattern I found in 20+ MCP server implementations: user-supplied tool parameters flowing directly into file paths, shell commands, or database queries without sanitization.
Step 3: Claude Code Analyzes Context
This is where it gets powerful. Semgrep gives you pattern matches, but Claude Code understands context. When I feed Semgrep results into Claude Code:
/scan --repo . --rules mcp-security
Enter fullscreen mode
Exit fullscreen mode
It does things Semgrep can't:
-
Traces data flow across multiple files
-
Understands whether a "validation" function actually validates anything
-
Identifies business logic flaws (like TOCTOU races in file operations)
-
Writes a proof-of-concept to verify the finding
Step 4: Verify and Report
Every finding gets a PoC. No PoC = no bounty. Claude Code generates these automatically:
# PoC: Path traversal in file-read tool import requests# PoC: Path traversal in file-read tool import requestspayload = { "tool": "read_file", "params": { "path": "../../../../etc/passwd" } }
response = requests.post( "http://localhost:3000/mcp/tool", json=payload )
If status 200 and response contains "root:",
the path traversal is confirmed
print(response.text)`
Enter fullscreen mode
Exit fullscreen mode
Real Results
Using this workflow, I've found:
-
Path traversal in 3 MCP server implementations
-
Command injection via unsanitized tool parameters in 2 repos
-
SSRF through URL-type tool inputs in 1 server
-
Information disclosure via verbose error messages in 5+ servers
My article "I Audited Microsoft's MCP Servers and Found 20 Vulnerabilities" covers some of these findings in detail.
Building Your Own Security Scanner Skill
The workflow above is exactly what my Security Scanner Skill for Claude Code automates. It bundles:
-
Pre-built Semgrep rules for MCP, FastAPI, Express, and Django
-
Automated triage that ranks findings by exploitability, not just severity
-
PoC generation for confirmed vulnerabilities
-
Report templates formatted for responsible disclosure or bounty submission
You invoke it with a single command and get a prioritized list of findings with proof-of-concepts. It's the difference between spending a week on manual review and getting actionable results in 20 minutes.
If you're building your own, here's what matters:
1. Layer your analysis
Static patterns (Semgrep) → Context analysis (LLM) → Dynamic verification (PoC)
Enter fullscreen mode
Exit fullscreen mode
Each layer filters out false positives from the previous one. Semgrep catches 200 matches, Claude Code narrows it to 15 likely vulns, PoC verification confirms 3-5 real issues.
2. Encode framework-specific knowledge
Generic "find SQL injection" rules produce noise. Rules that understand how Express middleware chains work, or how MCP tool definitions pass parameters, produce signal.
3. Automate the boring parts
Report writing, PoC scaffolding, CVSS scoring — these are mechanical tasks. Let the skill handle them so you can focus on the creative part: understanding the application's trust boundaries.
What Bounties Actually Pay
Let's be honest about the money:
Severity Typical Payout (OSS) Time to Find
Critical (RCE) $500-$5,000 Rare
High (SQLi, Path Traversal) $200-$1,000 1-3 per large audit
Medium (SSRF, Info Disclosure) $50-$500 3-5 per audit
Low (Missing Headers) $0-$50 Many, but not worth reporting
The math works when you can audit 3-4 repos per week instead of 1 per month. That's what automation gives you.
Getting Started
-
Set up Claude Code with Semgrep installed locally
-
Pick a target: start with repos that have SECURITY.md or a bug bounty program listed
-
Focus on input boundaries — every place user data enters the application
-
Build your rule library incrementally — every finding teaches you a new pattern
If you want to skip the setup and start scanning immediately, the Security Scanner Skill is $10 and comes with everything pre-configured. I also built an API Connector Skill that's useful for testing API-heavy targets — it handles auth flows, rate limiting, and response parsing so you can focus on the security analysis.
The best part about security bounties: the supply of vulnerable code is infinite and growing. Every new framework, every new MCP server, every new API — it's all attack surface waiting to be audited.
Have questions about security scanning with Claude Code? Drop a comment or check out my other articles on MCP security research.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
claudeopen-sourceapplication![[llama.cpp] 3.1x Q8_0 speedup on Intel Arc GPUs - reorder optimization fix (PR submitted)](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-neural-network-P6fqXULWLNUwjuxqUZnB3T.webp)
[llama.cpp] 3.1x Q8_0 speedup on Intel Arc GPUs - reorder optimization fix (PR submitted)
TL;DR : Q8_0 quantization on Intel Xe2 (Battlemage/Arc B-series) GPUs was achieving only 21% of theoretical memory bandwidth. My AI Agent and I found the root cause and submitted a fix that brings it to 66% - a 3.1x speedup in token generation. The problem : On Intel Arc Pro B70, Q8_0 models ran at 4.88 t/s while Q4_K_M ran at 20.56 t/s; a 4x gap that shouldn't exist since Q8_0 only has 1.7x more data. After ruling out VRAM pressure, drivers, and backend issues, we traced it to the SYCL kernel dispatch path. Root cause : llama.cpp's SYCL backend has a "reorder" optimization that separates quantization scale factors from weight data for coalesced GPU memory access. This was implemented for Q4_0, Q4_K, and Q6_K - but Q8_0 was never added. Q8_0's 34-byte blocks (not power-of-2) make the non-r

I benchmarked 37 LLMs on MacBook Air M5 32GB — full results + open-source tool to benchmark your own Mac
So I got curious about how fast different models actually run on my M5 Air (32GB, 10 CPU/10 GPU). Instead of just testing one or two, I went through 37 models across 10 different families and recorded everything using llama-bench with Q4_K_M quantization. The goal: build a community benchmark database covering every Apple Silicon chip (M1 through M5, base/Pro/Max/Ultra) so anyone can look up performance for their exact hardware. The Results (M5 32GB, Q4_K_M, llama-bench) Top 15 by Generation Speed Model Params tg128 (tok/s) pp256 (tok/s) RAM Qwen 3 0.6B 0.6B 91.9 2013 0.6 GB Llama 3.2 1B 1B 59.4 1377 0.9 GB Gemma 3 1B 1B 46.6 1431 0.9 GB Qwen 3 1.7B 1.7B 37.3 774 1.3 GB Qwen 3.5 35B-A3B MoE 35B 31.3 573 20.7 GB Qwen 3.5 4B 4B 29.4 631 2.7 GB Gemma 4 E2B 2B 29.2 653 3.4 GB Llama 3.2 3B 3B 2

Gemma 4 26b A3B is mindblowingly good , if configured right
Last few days ive been trying different models and quants on my rtx 3090 LM studio , but every single one always glitches the tool calling , infinite loop that doesnt stop. But i really liked the model because it is rly fast , like 80-110 tokens a second , even on high contex it still maintains very high speeds. I had great success with tool calling in qwen3.5 moe model , but the issue i had with qwen models is that there is some kind of bug in win11 and LM studio that makes the prompt caching not work so when the convo hits 30-40k contex , it is so slow at processing prompts it just kills my will to work with it. Gemma 4 is different , it is much better supported on the ollama cpp and the caching works flawlesly , im using flash attention + q4 quants , with this i can push it to literally
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!