Caching data with DataLoader

Gilad Tidhar

Caching

Caching is an essential part of designing scalable and performant GraphQL API.

But, what is caching? Caching is the technical design of trying to avoid compute-intensive tasks by keeping their results temporarily in a cache. A cache is a place in memory for important data, which can be delivered to the client faster than the regular procedure.

For example, for a GraphQL API, we might want to cache a field value that comes from a slow external API. The goal of caching is to improve performance where it’s needed the most.

How Is Caching Faster?

Using a cache for compute-intensive data is faster for a few reasons:

  • A cache is usually not stored in storage, but in memory, which is a much faster.
  • A cache is smaller than a database, so there are a lot less values to filter through.
  • While some cached data might be queried a lot of times, it will be fetched from the database only one time, so there’s less time lost for fetching multiple times.

How Do We Use Caching?

Caching with dataloader

My preferred platform for caching is dataloader, and in this tutorial I will explain how to use dataloader with Node.js and Typescript, to have caching in your server.

💡

Tip: dataloader outside of GraphQL

The dataloader package was created specifically for GraphQL, but it can work with other Node.js programs.

Using DataLoader

1. Install DataLoader in Your Project

Install dataloader by running the following:

npm install --save dataloader or yarn add dataloader

2. Import dataloader

To start, import dataloader and LRU (LRU is a cache algorithm that will put only the most X requested items in the cache).

import DataLoader from 'dataloader'
import { LRUMap } from 'lru_map'

3. Define the Types That Would Be Saved to Our Cache

The first type should be a way to represent the data, like a string with a name, and the other is the data itself. For example, let’s say we’re trying to cache books. Our package name will be the books’ id, so we can tell which book is which. Our package data will be the books’ length, writer, price etc…

so in this case, our schema would look like this:

type book {
  id: ID!
  name: String!
  author: author!
  price: Int!
  description: String!
  length: Int!
}
type author {
  id: ID!
  name: String!
}

and the corresponding data type would look like this:

type packageId = string
 
type PackageData = /* note that this is the type of the data we represent. */ {
  id: string
  name: string
  author: {
    id: string
    name: string
  }
  price: number
  description: string
  length: number
}

The reason we write types is for type safety, let’s say we have books and toys which have different values, we wouldn’t want that somehow toys will be inserted into our books cache

4. Create the DataLoader Instance

const dataLoader = new DataLoader<packageId, PackageData>(
  async (/* (1) */ keys: readonly packageId[]) =>
    /* (2) */ Promise.all(
      keys.map(async packageId => {
        await getData(packageId).catch(e => e)
      })
    ),
  {
    cacheMap: new LRUMap(100) /* (3) */
  }
)

As you can see, we give the keys that the user asks for to the dataloader (1).

I recommend setting the keys to be some kind of id for the following reason: Lets say that in your api, the user asks for data using its id, so the user will fetch with id:"something", you could just pass the id as the key, instead of changing it.

The dataloader will first check if it has the keys on cache, and if so, it would return the data, without going through the database, saving you a lot of valuable time.

After that (2), we give the dataloader a function for fetching the data, in case it doesn’t have it in the cache. In this case, I have the function getData, and I’m using the keys to “get data” from my database.

In the end (3) we give it cacheMap and some value, the value represents how many queries dataloader will cache, in this case, after a 100 values, it will delete the least used value (the one that wasn’t queried for the longest time) to make space for the 101st value.

From now, to query data, you just run

dataLoader.load(keys)

DataLoader with GraphQL

Dataloader was designed to work with GraphQL, to solve the n+1 problem

The N+1 problem is a common problem when designing GraphQL API. Looking at the Query below, we can see that the GraphQL API will call 20 times the Book.author resolver:

query BooksWithAuthors {
  books(first: 20) {
    id
    title
    author {
      id
      name
    }
  }
}

Depending on the resolver implementation, this query might trigger 20 SQL queries or API calls to resolve authors that might’ve wrote multiple books. Dataloader helps to solve this problem by caching, deferring and grouping similar resolver calls.

How to Use the DataLoader Package with GraphQL?

To use the dataloader with GraphQL just pass it in the context!

Now, your code should look somewhat like this:

import DataLoader from "dataloader";
import { LRUMap } from "lru_map";
 
type packageId = string;
type PackageData = Package;
 
export type GraphQLContext = {
  dataLoader: DataLoader<packageId, PackageData>;
  // ...
};
const dataLoader = new DataLoader<packageId, PackageData>(
  async (keys: readonly PackageId[]) => {
    return await Promise.all(
      keys.map(async (packageId) => {
        await getData(packageId).catch((e) => e);
      })
    );
  },
  {
    cacheMap: new LRUMap(100),
  }
);
// ...
export async function contextFactory() {
  return { dataLoader, ... };
}

Now, in your resolvers, just call context.dataLoader.load(keys) and that’s it! You now have caching in your server!

An example of a resolver implementation:

export const resolvers = {
  Query: {
    getBook(parent, input, context) {
      return context.dataLoader.load(input.bookId)
    }
  }
}

for further learning, check these out:

The n+1 problem The dataloader github page A great video about dataloader

Join our newsletter

Want to hear from us when there's something new?
Sign up and stay up to date!

*By subscribing, you agree with Beehiiv’s Terms of Service and Privacy Policy.