Caching strategies

One of the easiest and most popular ways to increase system performance is to use caching. When we introduce caching, we automatically duplicate our data. It's very important to keep your cache and data source in sync (more or less, depends on the requirements of your system) whenever changes occur in the system.
In this article, we will go through the most common cache synchronization strategies, their advantages, and disadvantages, and also popular use cases.

Cache-Aside

Based on my experience, this one is perhaps the most commonly used caching strategy. The idea is that the cache sits on the side and the application talks to both the cache and the data source. The application logic inspects the cache before hitting the data source.

Here's what's happening on request:

The application determines whether the item is currently held in the cache.
If the item is in the cache it's called cache hit. The item is read from the cache and returned to the client.
If the item is NOT found in the cache it's called cache miss. The application reads the item from the data store, stores a copy of the item in the cache, and returns it back to the client.

Use Cases

A cache-aside strategy is usually a general purpose and works best for read-heavy workloads.

Pros

Systems using cache-aside strategy are resilient to cache failures. If the cache goes down, the system can still operate by going directly to the data source.

Cons

Implementing the cache-aside pattern doesn't guarantee consistency between the data store and the cache. Do guarantee this we need to use other strategies to update or invalidate caches.
When the data is requested the first time, it always results in the cache miss and requires the extra time of loading data the cache. To deal with this developers use 'warming' or 'pre-heating' the cache by issuing queries manually.

Read-Through

Instead of managing both the data source and the cache, we can simply delegate the data source synchronization to the cache provider. All data interaction is done through the cache abstraction layer.
Both cache-aside and read-through load data lazily only on the first read.

Use Cases

Read-through caches work best for read-heavy workloads when the same data is requested many times.
It's very similar to cache-aside strategy, but there is difference:
In cache-aside, the application is responsible for fetching data from the data store and populating the cache. In read-through, this logic is supported by the library or cache provider.

Pros

Decrease load on the data source when there are a lot of reads, because cache provider can sync access to cache keys, so in the end we will have only one cache miss.
Systems using read-through strategy can also be resilient to cache failures. If the cache goes down, the cache provider can still operate by going directly to the data source.

Cons

When the data is requested the first time, it always results in the cache miss and requires the extra time of loading data the cache. To deal with this developers use 'warming' or 'pre-heating' the cache by issuing queries manually.
Just like cache-aside, data can also become inconsistent between cache and the data source.

Write-Through

Analogous to the read-through data fetching strategy, the cache provider can update the underlying data source every and cache entry on update request. The cache sits in-line with the data source and writes always go through the cache to the data source (or vice versa).

Use Cases

On its own, write-through doesn't seem to do much. In fact, it introduces extra write latency because data is written to the cache and to the data source. But when paired with the read-through, we get all the benefits of read-through and we also get data consistency guarantee, freeing us from using cache invalidation techniques.

Pros

Advanced data consistency guarantee.

Cons

Increased write latency.

Write-Behind

If strong consistency is not mandated, we can simply enqueue the cache changes and periodically flush them to the data store.

Use Cases

Write-behind improves the write performance and is good for write-heavy workloads. When combined with read-through, it works well for mixed workloads, where the most recently updated and accessed data is always in cache.

Pros

It's resilient to data source failures and can tolerate some data source downtime.
If batching or coalescing is supported, it can reduce overall writes to the data source, which decreases the load and reduces costs.

Cons

If there's a cache failure, the data may be lost permanently.

Some developers use Redis for both cache-aside and write-back to better absorb spikes during peak load. And also most relational databases storage engines (i.e. InnoDB) have write-back cache enabled by default in their internals. Queries are first written to memory and eventually flushed to the disk.

Blog about software development

Search This Blog