Best variant per model
Quality vs the VRAM to run it
Each point is a model at its best-scoring quant. Up = smarter; left = fits a smaller card. The dotted line is the point-estimate efficiency frontier — no measured model is both higher-scoring and smaller on current point estimates. Hover any point for details.
Only 0 models measured so far — the efficiency frontier is preliminary and firms up as more variants land.
Leaderboard summary
No ranked variants yet
Partial benchmark profiles are available on model pages, but the Local Intelligence Index ranks only rows with Agentic, Knowledge, Instruction, Tool calling, and Coding all measured under the standard capped-thinking lane.
Benchmark a model · preview
Get the recipe to benchmark a model
Local Intelligence Index · v2.1 | 50/15/15/10/10
Pick your VRAM and a model to get the exact benchmark command. The board ranks Qwen3 and Gemma families today; the v1 suite the recipe needs, and one-step submission, ship with v2.
Most-downloaded board-rankable models that fit · catalog snapshot, not an endorsement.
localbench does not download or run the model. First start a local server, then localbench sends the benchmark to that endpoint.
Board-comparable · capped-thinking · qwen3 · suite/v1
llama-server -hf MaziyarPanahi/Qwen3-0.6B-GGUF:Q8_0 --port 8080
localbench run --endpoint http://localhost:8080/v1 --model MaziyarPanahi/Qwen3-0.6B-GGUF:Q8_0 --hf-model-id Qwen/Qwen3-0.6B --suite-dir suite/v1 --lane capped-thinking --reasoning-activation qwen3 --tier standard --out my-run.json
Do not change sampling, context, or prompt-template settings unless the recipe says so. VRAM tiers are recommendations, not guaranteed fits · close other GPU workloads.