Skip to main content

Caching strategies

Caching Strategies

One of the easiest and most popular ways to increase system performance is to use caching. When we introduce caching, we automatically duplicate our data. It's very important to keep your cache and data source in sync (more or less, depends on the requirements of your system) whenever changes occur in the system.
In this article, we will go through the most common cache synchronization strategies, their advantages, and disadvantages, and also popular use cases.

Cache-Aside

Based on my experience, this one is perhaps the most commonly used caching strategy. The idea is that the cache sits on the side and the application talks to both the cache and the data source. The application logic inspects the cache before hitting the data source.

Aside-Cache Diagram
Here's what's happening on request:
  1. The application determines whether the item is currently held in the cache.
  2. If the item is in the cache it's called cache hit. The item is read from the cache and returned to the client.
  3. If the item is NOT found in the cache it's called cache miss. The application reads the item from the data store, stores a copy of the item in the cache, and returns it back to the client.

Use Cases

A cache-aside strategy is usually a general purpose and works best for read-heavy workloads.

Pros

  1. Systems using cache-aside strategy are resilient to cache failures. If the cache goes down, the system can still operate by going directly to the data source.

Cons

  1. Implementing the cache-aside pattern doesn't guarantee consistency between the data store and the cache. Do guarantee this we need to use other strategies to update or invalidate caches. 
  2. When the data is requested the first time, it always results in the cache miss and requires the extra time of loading data the cache. To deal with this developers use 'warming' or 'pre-heating' the cache by issuing queries manually.

Read-Through 

Instead of managing both the data source and the cache, we can simply delegate the data source synchronization to the cache provider. All data interaction is done through the cache abstraction layer.
Both cache-aside and read-through load data lazily only on the first read.

Read-Through Diagram

Use Cases

Read-through caches work best for read-heavy workloads when the same data is requested many times.
It's very similar to cache-aside strategy, but there is difference:
In cache-aside, the application is responsible for fetching data from the data store and populating the cache. In read-through, this logic is supported by the library or cache provider.

Pros 

  1. Decrease load on the data source when there are a lot of reads, because cache provider can sync access to cache keys, so in the end we will have only one cache miss.
  2. Systems using read-through strategy can also be resilient to cache failures. If the cache goes down, the cache provider can still operate by going directly to the data source.

Cons 

  1. When the data is requested the first time, it always results in the cache miss and requires the extra time of loading data the cache. To deal with this developers use 'warming' or 'pre-heating' the cache by issuing queries manually.
  2. Just like cache-aside, data can also become inconsistent between cache and the data source.

 Write-Through

Analogous to the read-through data fetching strategy, the cache provider can update the underlying data source every and cache entry on update request. The cache sits in-line with the data source and writes always go through the cache to the data source (or vice versa).


Write-Through Diagram

Use Cases

On its own, write-through doesn't seem to do much. In fact, it introduces extra write latency because data is written to the cache and to the data source. But when paired with the read-through, we get all the benefits of read-through and we also get data consistency guarantee, freeing us from using cache invalidation techniques.

Pros

  1. Advanced data consistency guarantee.

Cons

  1. Increased write latency.

Write-Behind 

If strong consistency is not mandated, we can simply enqueue the cache changes and periodically flush them to the data store.

Write-Behind Diagram

Use Cases

Write-behind improves the write performance and is good for write-heavy workloads. When combined with read-through, it works well for mixed workloads, where the most recently updated and accessed data is always in cache.

Pros

  1. It's resilient to data source failures and can tolerate some data source downtime.
  2. If batching or coalescing is supported, it can reduce overall writes to the data source, which decreases the load and reduces costs.

Cons

  1. If there's a cache failure, the data may be lost permanently.

Some developers use Redis for both cache-aside and write-back to better absorb spikes during peak load. And also most relational databases storage engines (i.e. InnoDB) have write-back cache enabled by default in their internals. Queries are first written to memory and eventually flushed to the disk.

Comments

Popular posts from this blog

How to Build TypeScript App and Deploy it on GitHub Pages

Quick Summary In this post, I will show you how to easily build and deploy a simple TicksToDate time web app like this: https://zubialevich.github.io/ticks-to-datetime .

Pros and cons of different ways of storing Enum values in the database

Lately, I was experimenting with Dapper for the first time. During these experiments, I've found one interesting and unexpected behavior of Dapper for me. I've created a regular model with string and int fields, nothing special. But then I needed to add an enum field in the model. Nothing special here, right? Long story short, after editing my model and saving it to the database what did I found out? By default Dapper stores enums as integer values in the database (MySql in my case, can be different for other databases)! What? It was a surprise for me! (I was using ServiceStack OrmLite for years and this ORM by default set's enums to strings in database) Before I've always stored enum values as a string in my databases! After this story, I decided to analyze all pros and cons I can imagine of these two different ways of storing enums. Let's see if I will be able to find the best option here.

How to maintain Rest API backward compatibility?

All minor changes in Rest API should be backward compatible. A service that is exposing its interface to internal or/and external clients should always be backward compatible between major releases. A release of a new API version is a very rare thing. Usually, a release of a new API version means some global breaking changes with a solid refactoring or change of business logic, models, classes and requests. In most of the cases, changes are not so drastic and should still work for existing clients that haven't yet implemented a new contract. So how to ensure that a Rest API doesn't break backward compatibility?