The Problem

Disco Chess scans your games from Lichess and Chess.com to find positions where you missed a tactic. We feed those positions into your Mistake Review queue so you can train on your actual weaknesses.

To find missed tactics, we need Stockfish evaluations for every position. At depth 36, that's seconds per position. A 40-move game has 80+ positions. Scanning a user's recent games could mean thousands of positions. The compute costs were adding up fast.

The obvious optimization: Lichess publishes their entire evaluation database monthly - 342 million positions already analyzed by Stockfish 16+ at depth 36+. If we could look up positions in that database fast enough, we'd only need to run Stockfish on positions that aren't already there.

But how do you query 342 million positions in milliseconds?

The Insight: Material Rarely Changes

Here's what makes chess positions special: in a typical game, captures are rare. You might go 8-10 moves between captures. That means 8-10 consecutive positions with identical material (same pieces on the board).

This is the key insight. If we group positions by their material configuration, consecutive positions in a game will almost always land in the same group. Load that group once, and you've got a cache hit for the next several lookups.

We call these groups "shards." Each shard contains all positions with a specific material signature - encoded as a 19-bit key covering queens, rooks, minor pieces, and side to move. This gives us 8,359 shards containing 342 million positions.

Proof: 10,000 Magnus Carlsen Games

Theory is nice. We wanted proof.

We ran 10,268 Magnus Carlsen games (873,418 positions) through a cache simulator, comparing our material-based sharding against hash-based sharding (random distribution):

Strategy	Cache Hit Rate (LRU-100)
Material-based	93.42%
Hash-based	2.00%

Material-based sharding wins by 91 percentage points. It's not even close.

The numbers confirmed our hypothesis: material changes on average every 8.7 positions. With a modest 100-shard LRU cache, we hit the cache 93% of the time. Hash-based sharding, by contrast, scatters consecutive positions across random shards - destroying locality entirely.

One subtlety worth noting: the side-to-move bit flips every position, so you need at least LRU-2 to cache both the white-to-move and black-to-move variants of the same material configuration.

The Tradeoff We Accepted

Material-based sharding has a cost: extreme size imbalance.

Top N Shards	% of Total Data
Top 1	15.3%
Top 10	66.5%
Top 100	98.6%
Bottom 7,000	~0%

93% of shards are under 10KB. Four shards exceed 1GB. The largest (shard 13449, positions where one side is up a full set of minor pieces) weighs in at 2.8GB.

Why so skewed? Lichess's database is biased toward "interesting" positions. Winning positions get analyzed more than equal ones. Asymmetric material = more data.

This is fine for our use case. Sequential game analysis keeps hitting the same cached shards. The long tail of tiny shards barely matters. But if your access pattern is random lookups across unrelated positions, hash-based sharding would serve you better.

We chose locality over uniformity. The Magnus data proved it was the right call.

Making It Work

Two engineering challenges nearly killed us:

Memory during builds. Our initial implementation loaded all records for a shard into memory before writing. Fine for small shards. But shard 13449 has 18 million records. At ~200 bytes each, that's 3.6GB - except Go's memory overhead ballooned it past 14GB, and our build nodes OOM'd.

The fix: streaming writes with external merge sort. Write records to temporary sorted chunks during ingestion, merge-sort while streaming directly to output, never hold more than a buffer in memory. Peak memory dropped from 16GB+ to under 8GB.

Cold starts. First lookup in a 2.8GB shard means downloading and decompressing 2.8GB. We accepted this tradeoff - cold starts are slow, but they're rare. Once a shard is cached, it stays cached. For sequential game analysis, you pay the cold start cost once and then cruise on cache hits.

The Stack

Each shard is a simplified SSTable - sorted FEN strings with binary search, compressed with zstd (6-8x compression ratio). The format is described in Google's Bigtable paper, though we stripped it down: no bloom filters, no block indexes, no compaction. Just sorted records and compression.

Shards live in Google Cloud Storage. Any client can fetch only the shards it needs - no central database server, no ops overhead, works anywhere with network access. The full dataset is 18.3GB compressed across 8,359 files.

In production, stockpile sits in a three-tier pipeline:

Redis (hot positions, sub-ms)
  ↓ miss
Stockpile (342M pre-computed, ~100us cached)
  ↓ miss
Stockfish (novel positions, seconds)

Most requests never reach Stockfish.

Use It Yourself

Stockpile is open source (MIT license). If you're building something that needs chess position evaluations:

import "github.com/discochess/stockpile"

client, _ := stockpile.NewGCSClient(ctx, "bucket", "prefix", 100)

eval, err := client.Lookup(ctx, "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq -")
if err == stockpile.ErrNotFound {
    // Fall back to Stockfish
}

Or use the CLI:

bash

stockpile lookup --data gs://bucket/prefix "r1bqkb1r/pppp1ppp/2n2n2/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq -"

The library handles shard computation, caching, and GCS fetching. You pass in FENs and get evaluations back.

What We Learned

Domain knowledge beats generic solutions. A chess-specific insight (material rarely changes) gave us 93% cache hits. A generic hash-based approach would have given us 2%.

Measure with real data. The Magnus Carlsen experiment wasn't just validation - it was how we confirmed our hypothesis. 873,418 positions across 10,268 games. Real workload, real numbers.

Accept tradeoffs explicitly. Our shard distribution is wildly imbalanced. We decided that was fine for sequential access patterns and moved on. Trying to "fix" it would have destroyed the locality that makes the system work.

Simple beats clever. The core lookup path is ~400 lines of Go. No distributed consensus, no cache invalidation complexity. Sorted files, binary search, LRU cache. It works.

Try It

If you're a Disco Chess user, you're already benefiting from this - game analysis is faster because we're not running Stockfish on positions that have already been analyzed.

If you're an engineer building chess tools, stockpile is on GitHub. The Lichess evaluation data is freely available under CC0.

And if you just want to improve at chess, start training.

How We Index 342 Million Chess Positions for Millisecond Lookups

Key Takeaways

The Problem

The Insight: Material Rarely Changes

Proof: 10,000 Magnus Carlsen Games

The Tradeoff We Accepted

Making It Work

The Stack

Use It Yourself

What We Learned

Try It

Get Started with Disco Chess

Disco Chess vs ChessTraining.app: Focused Tactics vs All-in-One Platform

Introducing Your Games: Train on Tactics You Actually Missed

Does the Woodpecker Method Work? Analyzing 120,000 Puzzle Attempts

Ready to try the Woodpecker Method?

Key Takeaways

The Problem

The Insight: Material Rarely Changes

Proof: 10,000 Magnus Carlsen Games

The Tradeoff We Accepted

Making It Work

The Stack

Use It Yourself

What We Learned

Try It

Get Started with Disco Chess

Related articles

Disco Chess vs ChessTraining.app: Focused Tactics vs All-in-One Platform

Introducing Your Games: Train on Tactics You Actually Missed

Does the Woodpecker Method Work? Analyzing 120,000 Puzzle Attempts

Ready to try the Woodpecker Method?