When Simple Caching in Microservice APIs Are Not Enough

Mike Treadway

The first thing I see when developers want to improve the performance of their API is caching. Caching is a tried-and-true method of storing information in-memory so that it can be retrieved in a subsequent request instead of needing to compute or obtain the information again. A common use case of this is when a microservice needs to make a request to another microservice for information. Rather than making a request to the other microservice every single time, the information can be cached in memory after the first request, and then served directly from memory for subsequent requests.

Implementing an in-memory cache is quite simple, and in some cases may be just what you need. An in-memory cache can be a hash map, or even an off-the-shelf library such as node-cache and cachetools. In order to illustrate, I'll be referring to a simple micro-service that stores and retrieves values by a key. Below is its implementation in Node.js.

As shown above, the request to obtain information by key requires communication with our database, which is going to the most expensive part of this request. A simple hash map to store the information in memory is a great way to circumvent the expensive database operation, however, there are a few problems with this:

The cache won't improve performance if you're running this service in a cluster, and the next request lands on another node.
A user could get stale data if someone updates the value in a request that is processed by another node, and then reads from the node that had the old cached data.
Memory limits and load on the service may impose problems depending on the size of the data being cached and the amount of memory available.

Many of aforementioned caching libraries out there can handle some of the issues with automated cache size control, TTL (time to live) settings, etc. The harder problems with simple caching come into light when you start scaling your services horizontally, running dozens of instances of them. One easy way to solve these problems is with a Multi-layer Cache.

Multi-layer Caching

Multi-layer Caching is when you have more than one cache for the same set of items. The idea is that if an item doesn't appear in the 1st cache, it's looked for in the 2nd cache, and so on. If an item is added, updated, or deleted in the 1st cache, the 2nd layer is also updated, and so on. The benefit of this comes when you have different storage mediums and policies on the different cache layers. This concept is definitely not new, it's been used in processors, disks, and many database applications.

In most implementations of this pattern, you'll have an in-memory cache as the 1st layer, and an out of process storage medium for the 2nd layer (i.e. a database). Before you start crying foul on using a database as a cache layer, when the database with the real data in it is also available, let me explain why this is different. A typical database to hold data for your service has many requirements imposed on it that a cache database does not; namely around availability and distribution.

Your service database must be available or your service may not function. This typically means building database clusters to provide high availability where those clusters may not even be on the same network segment as your service.
Does your service data need to be consistent globally or regionally? If you have services running in another geography, do they have to connect back over to the geography where you database cluster is running?
Does your database run in a distributed environment?
The data for your service is most likely not ephemeral, meaning it is stateful and must be backed up, and restored in a disaster recovery situation.

Now suppose I said you could just run a simple database that's colocated with your services, that doesn't have to be replicated or always available. If this database is down, your service still functions. You don't need to back it up, or worry about restoring anything in a disaster. You can have an instance running in each geo of your service and not worry about replication. The point is, a caching layer database doesn't have the same operational requirements as a service database.

Data stored in a cache is ephemeral, meaning it doesn't need to stick around if the service or database is rebooted. This means we can also treat cached items as opaque items and not be concerned with the structure or schema of the data, making a database cache layer easier to implement that you might think.

Benefits

Multi-layer caches provide most of their benefit when you're scaling your services and running dozens or hundreds of instances of the same service. Each service maintains its own in-memory 1st layer cache for optimal read performance, but it also stores the same information in an out of process 2nd layer (i.e. database). This means if all of the services are using the same database as their 2nd layer, a cached item stored by one, is beneficial to all. If another node in the cluster attempts to read data from the multi-layer cache, it will not find the data in it's 1st layer, but will find it in its 2nd layer. It then subsequently updates it's 1st layer in-memory cache with the information for subsequent requests.

So what happens when data is written on one service instance, updated on another, then read back on the 1st? The data read back is considered stale in that situation, so multi-layered caching didn't solve that for us here. However, it can help a little. Each layer in the cache can have its own policies on cache size and TTL. This allows you to tweak the caching policies in such a way that helps you minimize the issue. You could set your TTL for your in-memory cache to be short, and your 2nd layer to be longer. In this way, you can control the maximum amount of time stale data can be around. Of course, this only applies in situations where eventual consistency is allowed. Meaning, you only guarantee that any update made, will eventually be reflected by all.

Taking this a step further, depending on the technology chosen for the 2nd layer, you could have changes made to the 2nd layer notify all connections so that they can in turn update their 1st layer.

Implementation

Implementing a muli-layer cache can be as easy as finding a library that supports it. In my example above, I've chosen to replace my simple cache with cache-manager. cache-manager comes with built-in support for memory and redis as storage mediums. Below is the updated code for using cache-manager.

If you're running in a containerized environment with something like Kubernetes, running a Redis container with ephemeral storage is quite simple. Remember, there's no need to worry about keeping the data round between reboots or replicating to other geos.

It Doesn't End Here

Unfortunately, the journey to perfecting performance out of your microservices doesn't end here. Multi-layered caching can make a world of difference in your services, but there are still areas to look into such as sticky sessions with a load balancer, code optimization, etc.

Author Details

MT