Decide which coding model to use, in seconds. Ranked by your priorities, source contradictions surfaced, local options labeled with the exact quant that fits your GPU. Available in English and Turkish in the live UI.
A new coding LLM ships every two weeks — Opus 4.7, Kimi K2.6, Qwen3.6-27B, DeepSeek V4 in just the last month. When you actually need to pick one, today's trackers leave you stuck:
- artificialanalysis.ai / llm-stats / BenchLM force you into hours of review — no opinion, conflicting scores side by side, you interpret them.
- aider.chat has not updated since November 2025 (5 months stale). You are deciding from rotten data.
- No Turkish coverage anywhere. Translating each global benchmark page is a separate time tax.
AICoderMap answers the questions that actually shape the decision:
| Question | AICoderMap's answer |
|---|---|
| "Which model fits my workflow?" | Slide the weights to your priorities (SWE-focused, agentic-focused, balanced, or custom) — ranking updates instantly. Four built-in presets plus your own custom mix. |
| "SWE-Verified 87 vs SWE-Pro 64 — which is real?" | Every score carries a ⚠ / 🚨 flag when sources disagree. The tooltip lists each source (Anthropic / Scale SEAL / community) with its tier (S = self-reported, I = independent, C = community). Decide on raw evidence, not inflated headlines. |
| "What runs on my RTX 3070?" | WebGPU auto-detects your hardware on page load. Every local model gets a label like "Fits (10 GB · UD-IQ2_XXS)" — exact quant name plus GB. For models that overflow, you also see "+3 GB RAM" offload recommendations. |
| "How fresh is the data?" | Each row shows its last-updated date. Refreshed every 14 days at most — none of Aider's 5-month staleness. |
| "I want to share this on Twitter/Discord." | One click → PNG export for a card, the comparison table, or the full page. No screenshot fiddling. |
| "I want to read it in Turkish." | Every UI label and every benchmark description is written in TR — no translation tax to use the tool. |
Pre-launch — implementation in progress. 5-week solo part-time plan (M1 Foundation → M5 Launch).
| Document | What it covers |
|---|---|
| PRD | Product Requirements (users, features, metrics) |
| TechSpec | Technical Specification (architecture, data, API, security) |
| ImplGuide | ⭐ Coding-ready implementation guide |
| Tasks | 23-task / 5-milestone breakdown |
| Workflow | Update workflow (14 happy-path + 5 exception steps) |
| Pitch | Short pitch — for sharing |
Single external service: GitHub Pages. Ongoing cost: $0.
- Vanilla HTML/CSS/JS (no build step, no framework)
- Static JSON data files (the skill regenerates them)
- WebGPU API (browser-native GPU detect)
- html2canvas (vendored, PNG export)
- Local Claude Code skill + research agent (manual update workflow)
A fresh clone needs only Python 3.10+, Node 18+ (for the linter), and Git Bash on Windows (or any POSIX shell). No package manager, no build step.
Prerequisites:
- Python 3.10+ (uses
re.fullmatch, walrus operator, modern type hints) — standard library only, nopip installneeded. - Node 18+ — only for
scripts/regex-lint.js(validates the regex corpus). Skip if you don't touch the regex library. - Git Bash / WSL / Linux / macOS for
auto/bench.sh. On native Windows cmd.exe, useauto\bench.batinstead. - Claude Code (any recent version) for
/aicodermapskill triggers — the skill + agent are project-scoped and load automatically when you open this directory.
Local commands (no install required, all stdlib / vendored):
# Run the ds-tune evaluation (slug correctness + coverage proxy):
bash auto/bench.sh # POSIX
auto\bench.bat # Windows native
# Update verification map from the latest agent artifact:
python scripts/verification-map.py update
# Bootstrap verification map from accumulated sources.json (one-shot):
python scripts/verification-map.py bootstrap
# Merge an agent run into data/* (called by the skill, but also runnable standalone):
python scripts/merge.py
# Lint the regex corpus (Node):
node scripts/regex-lint.js
# Migrate schema (rare, one-shot):
python scripts/migrate-schema.py
# OCR images embedded in vendor blog posts:
python scripts/extract-images.py <url>
# Regenerate sources-whitelist.json `format` keys from the schema:
node scripts/whitelist-format-migration.jsLive preview: open index.html in any modern browser, no server needed. The site reads data/*.json over file://. For deploy verification, use https://sungurerdim.github.io/aicodermap/.
What is NOT in the repo (gitignored, regenerated on demand):
.aicodermap-agent-out.json— last research-agent return (overwritten every cycle).aicodermap-verification-map.json— cross-cycle confirmed-cell cache (runpython scripts/verification-map.py bootstrapto rebuild fromdata/sources.json).aicodermap-images/— temporary PNG downloads for OCR*.bak,*.bak2,*.bak3— rotated backups created bymerge.pyauto/run.log— eval output (regenerated bybench.sh/bench.bat).ruff_cache/,__pycache__/— linter / Python bytecode caches
Everything else (skill, agent, scripts, data, vendor JS, docs, i18n, auto/ folder including fixtures + program.md + results.tsv) is tracked. A fresh clone is reproducible end-to-end.
- M1 Foundation (Week 1) — Repo + 4 JSON schemas + research agent
- M2 Core (Week 2) — Live tracker static render, TR/EN toggle
- M3 Integration (Week 3) — 13 must-have features (weights editor + GPU VRAM + contradiction flags + PNG). Bench cross-source coverage continues to climb each refresh — current M4 floor is 30 %, target ≥ 95 %.
- M4 Polish (Week 4) — SEO (sitemap, robots, JSON-LD, hreflang, OG/Twitter), a11y skip-link + focus rings, mobile card-stack, doc drift sweep, smoke harness.
- M5 Launch (Week 5) — Simultaneous TR + Global soft launch + 2-week validation
Every page state is reflected in the URL — copy the address bar (or click Copy share link in the Export section) and you've shared the exact ranking the other person will see. The state schema is human-readable and stable; CLI consumers can construct URLs by hand.
| Param | Values |
|---|---|
lang |
tr | en |
theme |
dark | light |
preset |
balanced | swe-focused | agentic-focused | reasoning-focused | benchmark-only | custom |
w |
comma-separated benchKey:weight pairs; honoured only when preset=custom |
tier |
frontier | open-flagship | coder-specialized | gemma | ollama-local | all |
deployment |
all | cloud | local |
provider |
vendor name (URL-encoded) | all |
vram |
integer GB (1..256) |
gpu |
webgpu vendor key (e.g. nvidia.rtx-4090) | auto |
open |
1 | 0 (open-license-only filter) |
search |
substring (URL-encoded) |
sort |
<columnKey>-<asc|desc> (e.g. swePro-desc, composite-asc) |
Example deep-link — Turkish UI, SWE-focused preset, only models that fit a 16 GB RTX 4080, sorted by SWE-bench Pro descending:
https://sungurerdim.github.io/aicodermap/?lang=tr&preset=swe-focused&deployment=local&vram=16&sort=swePro-desc
The site is a static GitHub Pages deploy backed by stable JSON files. Any
shell tool that can curl + jq can consume it; no auth, no rate limit, no
WAF. The schemas are documented in docs/TECHSPEC.md §3.
BASE=https://sungurerdim.github.io/aicodermap
# All models, just id + provider + tier + composite-relevant scores:
curl -s "$BASE/data/models.json" | jq '
[.[] | {id, name, provider, tier, open,
swePro: .bench.swePro, sweV: .bench.sweV,
lcb: .bench.lcb, tb2: .bench.tb2,
priceIn: .pricing.api[0].in, priceOut: .pricing.api[0].out}]
'
# Top-10 frontier models by SWE-bench Pro:
curl -s "$BASE/data/models.json" | jq '
[.[] | select(.tier=="frontier" and .bench.swePro != null)
| {id, swePro: .bench.swePro, priceIn: .pricing.api[0].in}]
| sort_by(-.swePro) | .[:10]
'
# Models that fit a 16 GB GPU (vramRequirement <= 16, including null = cloud):
curl -s "$BASE/data/models.json" | jq '
[.[] | select(.vramRequirement != null and .vramRequirement <= 16)
| {id, vram: .vramRequirement, license, swePro: .bench.swePro}]
'
# Cross-source contradictions for a model (≥3pp delta):
curl -s "$BASE/data/sources.json" | jq '
to_entries | map(select(
(.key | startswith("opus-4-7."))
and ([.value[].value] | (max - min)) >= 3
))
'
# Pull provenance for a single (model, bench) cell:
curl -s "$BASE/data/sources.json" | jq '."opus-4-7.swePro"'A consumer that wants the same view a sharable URL produces can fetch the
URL directly and read the <script type="application/ld+json"> block — it
contains the Dataset schema with distribution[] pointing at the three
canonical JSON files.
# Discover the dataset distribution from JSON-LD:
curl -s "$BASE/" | grep -oP 'application/ld\+json[^<]*<[^>]*>([\s\S]+?)</script>' | head -200Every value the tracker shows comes from one of the sources below. Each value carries a trustScore based on its tier (I > S > C > U), the number of confirming sources, and recency. Independent sources outweigh vendor self-reports; community sources are used only when no independent or official source exists; forum/social signals are never written into the data.
| Source | URL | Authority for |
|---|---|---|
| Scale SEAL | labs.scale.com/leaderboard | SWE-bench Pro (1865 tasks), HLE |
| SWE-bench (canonical) | swebench.com · github.com/SWE-bench/experiments | SWE-bench Verified, full SWE-bench |
| LiveCodeBench | livecodebench.github.io · livecodebench.com | LCB v6 (contamination-free) |
| Terminal-Bench | tbench.ai · terminal-bench.io | TB2 agentic execution |
| tau-bench | tau-bench.dev | tau2 agentic API-use |
| Aider Polyglot | aider.chat/docs/leaderboards | aider (warn: stale since Nov 2025) |
| MCP-Atlas | mcp-atlas.dev | mcpA tool-chain quality |
| Artificial Analysis | artificialanalysis.ai/leaderboards | aaIdx, aaCoding, aaAgentic, throughput, pricing |
| Vellum Leaderboard | vellum.ai/llm-leaderboard | independent SWE-V, GPQA, cost+latency |
| llm-stats | llm-stats.com | broad model catalog |
| LMArena | lmarena.ai | blind human preference (formerly LMSYS) |
| LiveBench | livebench.ai | contamination-resistant rotating evals |
| Berkeley BFCL | gorilla.cs.berkeley.edu | function-calling v3/v4 |
| BigCodeBench | bigcode-bench.github.io · HF leaderboard | code generation gold standard |
| EvalPlus | evalplus.github.io | HumanEval+ / MBPP+ rigorous |
| HF Open LLM Leaderboard | huggingface.co/spaces/open-llm-leaderboard | open-weight canonical aggregation |
| Klu.ai | klu.ai/llm-leaderboard | broader benchmark aggregator |
| Papers with Code | paperswithcode.com/area/code-generation | peer-reviewed leaderboards |
| arXiv | arxiv.org | original benchmark papers |
| BenchLM | benchlm.ai | verified vs provisional transparency; ProgramBench tracker |
| ProgramBench | programbench.com · arXiv 2605.03546 | cleanroom program reconstruction (Meta + Stanford + Harvard, 2026-05-05) |
| AgentBench | agentbench.ai | multi-domain agentic |
| MathArena | matharena.ai | AIME math reasoning (auxiliary) |
| Vals.ai | vals.ai/benchmarks | enterprise-gated benchmark sets |
| LMMarketCap | lmmarketcap.com | hourly market table |
Models are often hosted on multiple providers at different prices. The tracker shows per-provider pricing in each card and a price range in the comparison table — these are the sources surveyed.
| Source | URL | Extracts |
|---|---|---|
| OpenRouter | openrouter.ai | provider count, uptime%, alt pricing, throughput |
| Together AI | together.ai/models | quant variants, $/1M, batch tier |
| Fireworks AI | fireworks.ai/models | tier, throughput, batch pricing |
| DeepInfra | deepinfra.com/models | $/1M, throughput |
| Groq | console.groq.com/docs/models · groq.com/pricing | extreme-fast inference rates |
| Cerebras | inference-docs.cerebras.ai · cerebras.ai/inference | ultra-fast inference |
| SambaNova Cloud | cloud.sambanova.ai/models | catalog, throughput |
| Replicate | replicate.com | open-weight hosting, $/sec |
| Lepton AI | lepton.ai/pricing | enterprise pricing |
| Novita AI | novita.ai/model-api | catalog + pricing |
| SiliconFlow | siliconflow.cn/models | Chinese providers — Qwen / DeepSeek / MiMo |
| Anyscale | anyscale.com/endpoints | enterprise endpoints |
| Cloudflare Workers AI | developers.cloudflare.com/workers-ai/models | edge regions, free tier |
| AWS Bedrock | aws.amazon.com/bedrock | enterprise + region matrix |
| Azure AI Foundry | ai.azure.com/explore/models | enterprise + region |
| HuggingFace Inference Endpoints | huggingface.co | author canonical card |
| OpenCode Zen / Go | opencode.ai | edge endpoints, latency |
| Lambda Cloud | lambda.ai/inference | enterprise throughput |
| Tensorix | tensorix.ai | infrastructure / niche frontier hosting |
| Source | URL | Extracts |
|---|---|---|
| Ollama Library | ollama.com/library | tags, pullCount, architecture, parameters, license, releasedISO |
| HuggingFace Unsloth | huggingface.co/unsloth | UD dynamic quants (UD-IQ1_S → UD-Q4_K_XL) |
| HuggingFace bartowski | huggingface.co/bartowski | most-active quant maintainer |
| HuggingFace mradermacher | huggingface.co/mradermacher | high-quality quant set |
| HuggingFace lmstudio-community | huggingface.co/lmstudio-community | LM Studio-curated GGUFs |
| LM Studio | lmstudio.ai/models | desktop catalog |
| llama.cpp | github.com/ggerganov/llama.cpp/discussions | empirical VRAM/throughput data |
| MLX (Apple Silicon) | huggingface.co/mlx-community | mlx-quantized variants |
| vLLM | docs.vllm.ai/en/latest/models/supported_models | server-side support matrix |
| sglang | github.com/sgl-project/sglang | structured-output throughput |
| llmfit | github.com/AlexsJones/llmfit | 148-model HF curated DB (mirrored locally) |
These are the canonical announcements, model cards, pricing pages, and API docs from each model's maker. Used as primary source for release dates, license, context window, and API pricing — and as cross-check against independent benchmarks.
| Vendor | Sources |
|---|---|
| Anthropic | anthropic.com/news · docs.claude.com · pricing |
| OpenAI | openai.com/blog · platform.openai.com/docs/models |
| Google DeepMind | deepmind.google/discover · ai.google.dev/gemini-api/docs/models |
| Mistral | mistral.ai/news · docs.mistral.ai |
| DeepSeek | deepseek.com/news · api-docs.deepseek.com |
| xAI | x.ai/news · docs.x.ai/docs/models |
| Alibaba (Qwen) | qwenlm.github.io/blog · qwen-lm.github.io |
| Moonshot (Kimi) | kimi.com/blog · platform.moonshot.cn |
| Z.ai (GLM) | z.ai/news · docs.z.ai |
| Xiaomi (MiMo) | mimo.xiaomi.com · xiaomimimo.github.io |
| MiniMax | minimaxi.com/news · platform.minimaxi.com |
| Nvidia | build.nvidia.com · blogs.nvidia.com |
| Meta (Llama) | huggingface.co/meta-llama · ai.meta.com/blog |
| Google (Gemma) | huggingface.co/google · ai.google.dev/gemma |
| StepFun | stepfun.com |
| All Hands AI (Devstral) | all-hands.dev |
ApiDog Blog · The Decoder · DataCamp Blog · Build Fast With AI · Simon Willison · Latent Space · Swyx · Awesome-LLM · Awesome-Efficient-LLM · r/LocalLLaMA (community VRAM reports only) · Design Arena (UI auxiliary)
trustScore = tierWeight × min(verifications, 3)/3 × recencyDecay(date)
I-tier (independent) weight 1.0
S-tier (vendor) weight 0.7
C-tier (community) weight 0.4
U-tier (forum/social) weight 0.1 ← never written, cross-check only
recency: <30d=1.0 · <90d=0.85 · <180d=0.70 · <365d=0.50 · ≥365d=0.30
When two sources disagree on a value, the one with the higher trustScore wins. The losing value still appears in data/sources.json with its tier and score, so you can audit every decision.
Currently pre-launch / solo development. Issues and discussions will open in Phase 2. For now:
- ⭐ Star the repo to follow progress
- 🐛 Open an issue for benchmark-data corrections (after launch)
- 💡 Open a discussion for feature requests
MIT — see LICENSE. Code and data are public; attribution appreciated; no takedown power (public benchmark data).
A reusable Claude Code skill + research-agent template — domain-agnostic, cloneable to other tracker projects.
Author: Sungur Erdim · sungurerdim@gmail.com