Caching Strategies in 2026: Cache-Aside, Write-Through, and the Thundering Herd
Digital Engineering
Cache-aside, write-through, write-behind: each has a workload it fits and one it ruins. TTL choices, invalidation patterns, and how to avoid the thundering herd in a multi-layer cache.
By Arjun Raghavan, Security & Systems Lead, BIPI · July 19, 2024 · 7 min read
Most caching problems are not problems with caching. They are problems with not deciding which caching pattern fits the workload, then layering patterns on top of each other until nobody can reason about the system. We have audited caching layers where the same key was cached in three places with different TTLs and the application code was racing all three. The fix was not better tooling. The fix was choosing one pattern per data type and committing to it.
Cache-aside is the safe default
Application reads cache, misses, reads database, writes cache, returns. Application writes invalidate or update the cache. Simple, debuggable, fits 80 percent of read-heavy workloads.
When it is right: read-heavy data with infrequent writes, where stale-by-TTL is acceptable. Product catalog, user profile, configuration. The cache miss penalty is a database read, which you would have paid without the cache anyway.
When it breaks down: write-heavy workloads where invalidation traffic dominates. Workflows where reading stale data has correctness consequences. High-cardinality keys where cache fill costs are non-trivial.
Write-through pays a write tax for read consistency
Application writes go to cache and database in the same operation. Reads always hit the cache (or miss and fill, same as cache-aside). The cache is always consistent with the database, modulo failure modes.
When it is right: workloads where cache miss latency is unacceptable and you can pay the synchronous write cost. Session stores, recently-viewed lists, anything where a stale read is worse than a slightly slower write.
The failure mode: what happens when the cache write succeeds and the database write fails (or vice versa). Distributed transactions are not viable. The patterns that work: write to the database first, then invalidate or update the cache; if the cache update fails, log and let the next read repopulate. That is cache-aside with a write-time invalidation, which is honestly what most teams should be using.
Write-behind is for specific workloads only
Application writes go to cache. Cache asynchronously batches writes to the database. High throughput at the cost of durability and write-read consistency.
When it is right: counters, metrics, hot-path writes that can tolerate seconds of delay before persistence. Page view counts, session activity, real-time leaderboards.
When it ruins your day: anything where 'we lost the last 30 seconds of writes when the Redis node died' is unacceptable. Most production systems. We have seen one client try to use write-behind for order events and lose 4 minutes of orders in a Redis failover. The blame eventually landed on the architecture decision, not the failover.
TTL choices are workload-specific, not religious
There is no correct default TTL. There are correct TTLs for specific workloads. The decision factors:
- How often does the underlying data change?
- How bad is a stale read?
- What is the cache fill cost (database read latency)?
- What is the cache hit rate trend over different TTLs?
For most product-catalog-style workloads, TTLs of 5 to 60 minutes work well. For session-style data, TTLs match session length (24 hours, sliding). For configuration that changes rarely but must be fresh after a deploy, use explicit invalidation on deploy plus a TTL of 1 hour as the safety net.
The thundering herd is the problem you forget until it happens
Hot key, TTL expires, 1000 concurrent requests all see a miss, all hit the database simultaneously, database cannot handle the load, latency spikes, requests queue, more requests pile in. We have seen this take down clients twice this year.
The mitigations:
- Probabilistic early refresh: refresh the cache when 90 percent of the TTL has elapsed, randomized across requests
- Single-flight: the first request to miss locks the key, computes the value, and other concurrent requests wait for the first one (Go's singleflight package, similar in most languages)
- Stale-while-revalidate: serve stale data while async-refreshing in the background
- Negative caching: cache the absence of data with a short TTL to absorb queries for non-existent keys
Multi-layer caching: CDN, application, database
Three layers in most real systems. The mistake is treating them independently. The right model is a hierarchy: CDN serves the most popular requests with the longest TTL, the application cache (Redis) serves the next tier with shorter TTL, the database cache (Postgres buffer cache, query plan cache) handles the rest.
Invalidation propagates downward. A write invalidates the application cache and pushes a CDN purge for the affected URL. The CDN purge is best-effort; the TTL is the safety net. We treat CDN cache as authoritative-but-eventually-consistent and design URLs so that important content can be purged surgically (not 'purge everything').
Caching is one of those areas where the simple answer is right more often than people think. Pick a pattern that fits the workload, set a TTL that reflects the data's actual update rate, design for the thundering herd before it happens, and stop layering caches without thinking. The clients with the cleanest cache architecture have one pattern per data type and stick to it.
Read more field notes, explore our services, or get in touch at info@bipi.in. Privacy Policy · Terms.