Building a Real-Time Dota 2 Draft Prediction System with Machine Learning
<p>I built an AI system that watches live Dota 2 pro matches and predicts which team will win based purely on the draft. Here's how it works under the hood.</p> <p><strong>The Problem</strong><br> Dota 2 has 127 heroes. A Captain's Mode draft produces roughly 10^15 possible combinations. Analysts spend years building intuition about which drafts work — I wanted to see if a model could learn those patterns from data.</p> <p><strong>Architecture</strong></p> <p><em>Live Match → Draft Detection → Feature Engineering → XGBoost + DraftNet → Prediction + SHAP Explanation</em></p> <p>The system runs 24/7 on Railway (Python/FastAPI). When a professional draft completes, it detects the picks within seconds, runs them through two models in parallel, and publishes the prediction to a Telegram channel
I built an AI system that watches live Dota 2 pro matches and predicts which team will win based purely on the draft. Here's how it works under the hood.
The Problem Dota 2 has 127 heroes. A Captain's Mode draft produces roughly 10^15 possible combinations. Analysts spend years building intuition about which drafts work — I wanted to see if a model could learn those patterns from data.
Architecture
Live Match → Draft Detection → Feature Engineering → XGBoost + DraftNet → Prediction + SHAP Explanation
The system runs 24/7 on Railway (Python/FastAPI). When a professional draft completes, it detects the picks within seconds, runs them through two models in parallel, and publishes the prediction to a Telegram channel and website.
The Models
The workhorse. Gradient boosted trees trained on 28,000+ pro matches with:
Hero one-hots (240 features) — which heroes are on which team Player hero pool depth — how many games each player has on their hero Team form — rolling win rate over last 20 matches Lane matchup ratings — predicted lane outcomes based on hero positions Replay-parsed stats — gold/XP differentials, tower damage, fight participation from parsed replays Calibrated with isotonic regression so "70% confidence" actually means ~70% win rate.
DraftNet v4 (91 expert features) A custom PyTorch neural network that captures what an analyst would notice about a draft:
Hero synergies — 7,500+ pair interactions (Magnus+Ember = +8% win rate) Counter matchups — how well each hero handles the opposing draft BKB-pierce control — drafts with 3+ piercing disables win significantly more Damage balance — 80%+ single damage type = easy to itemize against Lane projections — predicted laning advantage before the game starts Timing windows — when each draft hits its power spike Team composition — push/teamfight/pickoff/split strategy classification The neural network uses self-attention + cross-team attention layers so each hero "sees" both its teammates and opponents.
Simplified expert feature example def compute_bkb_pierce(self, team_heroes, enemy_heroes): pierce_count = sum(1 for h in team_heroes if h in BKB_PIERCE_HEROES) enemy_bkb_dependence = sum(HERO_BKB_NEED[h] for h in enemy_heroes) return pierce_count * enemy_bkb_dependence / 5.0*
Feature Engineering: What Actually Matters
After training, I ran SHAP analysis to see which features the model values most. Some surprises:
Hero combos beat hero tier lists. The top 200 hero pairs contribute more to predictions than individual hero strength. Picking "the best hero" matters less than picking the best hero for your draft.
Lane matchups dominate early draft. At the 4-pick stage, lane advantage is 2x more predictive than team synergy. Synergy only takes over once all 10 heroes are locked.
Late-game scaling is overrated. Drafts built around "survive until 40 minutes" lose more than timing-focused drafts. Pro teams don't let you farm peacefully.
BKB-pierce is the most undervalued concept. Stacking Roar, Grip, Chrono, Duel consistently outperforms what the heroes' individual stats suggest.
The Prediction Pipeline
-
Draft detected (15-30s delay)
-
Hero IDs mapped to feature vectors
-
Player/team enrichment data fetched (hero pool, form, H2H)
-
XGBoost prediction + DraftNet prediction
-
Confidence calibration (temperature scaling T=1.8)
-
Dampening stack: market gap, standin penalty, league tier adjustment
-
Value bet detection: model confidence vs bookmaker odds
-
SHAP explanation generated
-
Published to Telegram + website
The dampening stack is crucial. Raw model confidence is often overconfident on low-data matches. Six modifiers adjust the prediction:
-
Calibration — isotonic regression maps raw scores to true probabilities
-
H2H — head-to-head history between the two teams
-
Variance — how stable the prediction is across model variants
-
Standin detection — confidence penalty when substitute players are detected
-
Market gap — if our prediction disagrees with bookmakers by >20%, compress the edge
-
League tier — unknown/amateur leagues get dampened toward 50%
Results
Metric Value Test accuracy (held-out) - 77% Tier-1 production accuracy - ~67% Brier score - 0.21 Training matches - 28,000+ DraftNet parameters - 304K Prediction latency - <2 seconds The production gap is mostly explained by missing enrichment data — 73% of live predictions don't have replay-parsed features available.
Draft Simulator
I also built a Captain's Mode simulator where you draft against the AI and watch the win probability update in real-time. It's at draft.britbets.xyz — useful for testing draft theories before ranked games.
The AI opponent uses the same DraftNet model to evaluate its picks. It's not perfect (it loves Magnus a suspicious amount) but it catches composition mistakes that humans miss.
What I'd Do Differently
Start with more data. 28K matches sounds like a lot, but after filtering for quality (tier-1/2 only, no standins, replay available), it's closer to 8K clean samples.
Calibration matters more than accuracy. A model that says "65%" and is right 65% of the time is more useful than one that says "80%" and is right 75%.
Don't trust your model on data it hasn't seen. Unknown teams, new patches, standin players — the model defaults to ~50% confidence and that's honest.
Stack ML: Python, XGBoost, PyTorch, SHAP, scikit-learn API: FastAPI on Railway Website: Next.js 16 on Vercel, Supabase Draft Simulator: React + Vite on Vercel Bot: python-telegram-bot
All predictions are logged before the match starts and published transparently — including misses. Stats are public at britbets.xyz/track-record.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelneural networktrainingExclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ
<a href="https://news.google.com/rss/articles/CBMimgNBVV95cUxQRERvb1UyTWJ1cmZMeHRlVVkwQTJOUk9fRG5aRTF2X3hwTTc2SVBsdUp4bzBBZ3RkUFhpdm5Ia2daZGNVZC1LUjU5VUZkdXlVNlRrdXVWQWNqQjZEWHM4ZG9iMzVwWk1wOF9saDVvUEV3N3lERWZtdlNKSXcwUERZNWJScWl3YW5hVkdBeUhPTEI0N1JScHd3SzFrdkRHVnBsMGdJaFAzV21xZjNuSFh5U2N4ejVEVEJIaDJSODVyc2NRR1ZKWloyV00wNmlieFlZOTdDXzJNTEVudUZKZWp3bWNvMnF5N1NNTGxuTmlBaUVsRFBIU0dpYWdBVGZ2TkVkQWJqY3g4TUNOSUZTTmlaY05ybURlUEVRT3JRcndNVXd0VGZKUXRUU1dmMHNCRDN6d3ZsRmhwREFscWpweXdHdVNmVTU4eTNDa1JnSGR6YkIwcThzeU9PS053T1diT1FwOE42SmxmWlBiZHE4cldCRl92SnFWeU4ta0lPdXNDdnJFU3NtczJPZG0tZEtHREM2eEhMNFVFdkd6Zw?oc=5" target="_blank">Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model</a> <font color="#6f6f6f">WSJ</font>

Execution-Verified Reinforcement Learning for Optimization Modeling
arXiv:2604.00442v1 Announce Type: new Abstract: Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, opti

Decision-Centric Design for LLM Systems
arXiv:2604.00414v1 Announce Type: new Abstract: LLM systems must make control decisions in addition to generating outputs: whether to answer, clarify, retrieve, call tools, repair, or escalate. In many current architectures, these decisions remain implicit within generation, entangling assessment and action in a single model call and making failures hard to inspect, constrain, or repair. We propose a decision-centric framework that separates decision-relevant signals from the policy that maps them to actions, turning control into an explicit and inspectable layer of the system. This separation supports attribution of failures to signal estimation, decision policy, or execution, and enables modular improvement of each component. It unifies familiar single-step settings such as routing and a
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ
<a href="https://news.google.com/rss/articles/CBMimgNBVV95cUxQRERvb1UyTWJ1cmZMeHRlVVkwQTJOUk9fRG5aRTF2X3hwTTc2SVBsdUp4bzBBZ3RkUFhpdm5Ia2daZGNVZC1LUjU5VUZkdXlVNlRrdXVWQWNqQjZEWHM4ZG9iMzVwWk1wOF9saDVvUEV3N3lERWZtdlNKSXcwUERZNWJScWl3YW5hVkdBeUhPTEI0N1JScHd3SzFrdkRHVnBsMGdJaFAzV21xZjNuSFh5U2N4ejVEVEJIaDJSODVyc2NRR1ZKWloyV00wNmlieFlZOTdDXzJNTEVudUZKZWp3bWNvMnF5N1NNTGxuTmlBaUVsRFBIU0dpYWdBVGZ2TkVkQWJqY3g4TUNOSUZTTmlaY05ybURlUEVRT3JRcndNVXd0VGZKUXRUU1dmMHNCRDN6d3ZsRmhwREFscWpweXdHdVNmVTU4eTNDa1JnSGR6YkIwcThzeU9PS053T1diT1FwOE42SmxmWlBiZHE4cldCRl92SnFWeU4ta0lPdXNDdnJFU3NtczJPZG0tZEtHREM2eEhMNFVFdkd6Zw?oc=5" target="_blank">Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model</a> <font color="#6f6f6f">WSJ</font>

Execution-Verified Reinforcement Learning for Optimization Modeling
arXiv:2604.00442v1 Announce Type: new Abstract: Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, opti

In harmony with gpt-oss
arXiv:2604.00362v1 Announce Type: new Abstract: No one has independently reproduced OpenAI's published scores for gpt-oss-20b with tools, because the original paper discloses neither the tools nor the agent harness. We reverse-engineered the model's in-distribution tools: when prompted without tool definitions, gpt-oss still calls tools from its training distribution with high statistical confidence -- a strong prior, not a hallucination. We then built a native harmony agent harness (https://github.com/borislavmavrin/harmonyagent.git) that encodes messages in the model's native format, bypassing the lossy Chat Completions conversion. Together, these yield the first independent reproduction of OpenAI's published scores: 60.4% on SWE Verified HIGH (published 60.7%), 53.3% MEDIUM (53.2%), and

Decision-Centric Design for LLM Systems
arXiv:2604.00414v1 Announce Type: new Abstract: LLM systems must make control decisions in addition to generating outputs: whether to answer, clarify, retrieve, call tools, repair, or escalate. In many current architectures, these decisions remain implicit within generation, entangling assessment and action in a single model call and making failures hard to inspect, constrain, or repair. We propose a decision-centric framework that separates decision-relevant signals from the policy that maps them to actions, turning control into an explicit and inspectable layer of the system. This separation supports attribution of failures to signal estimation, decision policy, or execution, and enables modular improvement of each component. It unifies familiar single-step settings such as routing and a

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!