Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Integrate dataloader for ref/collection lazy loading

See original GitHub issue

I’m using mikro to implement a GraphQL server. In GraphQL, queries are not known ahead of time, and so the “endpoints”/resolvers are implemented very piecemeal, i.e. “get this project’s clients”, “okay now get the 1st client’s name”, “okay now get the 2nd client’s name”.

However, this can easily lead to N+1s where if a project has 100 clients, and the Client resolver’s name function is called 100 times, on each call it does a (await clientRef.load()).name, we’ll end up with 100 individual SQL calls.

The standard way of solving this in JS GraphQL implementations is to use dataloader (https://github.com/graphql/dataloader), which lets the programmer still implement the resolvers in the ^ fashion (I’m loading client 1, then client 2, then client 3) but behind the scenes it uses the JS event loop to turn each of the “load client 1”, “load client 2”, “load client 3” promises into a single meta-promise of “load clients 1, 2, 3” that then, once that promise is fulfilled with a batch-friendly “load all three” implementation (typically hand-written by the programmer), dataloader breaks up the list of “client 1, client 2, client3” back out into the individual promises, and gives it back to each call site, who know thinks “hey I just got a single client” but really in an N+1 safe way.

Granted, the scheme requires the GraphQL implementation to trigger the “load name for all 100 clients” in a single tick of the event loop, but this happens almost defacto by just calling the “load name” method from a regular for loop.

Anyway, b/c I need this “batch load refs” behavior, I’ve hand-written a dataloader that looks like:

export class EntityDataLoader {
  private loader: DataLoader<Reference<any>, AnyEntity<any, any>>;

  constructor(em: EntityManager) {
    this.loader = new DataLoader<Reference<any>, AnyEntity<any, any>>(async refs => {
      // Group the refs per entity and mapValues to get their primary keys
      const idsPerEntity: Record<string, any[]> = {};
      asSequence(refs)
        .groupBy(ref => ref.__meta.className)
        .forEach((refs, key) => {
          idsPerEntity[key] = refs.map(ref => ref.__primaryKey);
        });
      // Load all the rows for each entity
      const promises = Object.entries(idsPerEntity).map(([entity, ids]) => {
        return em.getRepository(entity).find({ id: ids } as any);
      });
      await Promise.all(promises);
      // Now once the entities are already in our Unit Of Work, we can load the refs w/o triggering N SQL calls.
      return refs.map(async ref => await ref.load());
    });
  }

  load<T>(ref: Reference<T>): Promise<T> {
    // Our implementation promises to turn Ref<T> into a Promise<T> so this cast is okay.
    return this.loader.load(ref) as Promise<T>;
  }
}

So now in my Client resolver, it can look like:

  Client: {
    async email(root) {
      return (await root.load()).email;
    },
    async name(root, args, ctx) {
      const client = await ctx.refDataLoader.load(root);
      return client.name;
    },
  },

And the email resolver will N+1 (because its directly calling root.load() where root is an IdentifiedRef) while name will not b/c it uses ctx.refDataLoader.load to batch load all of the refs accessed within this event loop.

It would be great if IdentifiedRef.load() (or more likely whatever internal loading logic it calls) used dataloader natively to just do this for me (i.e. by using a DataLoader instance associated with the Unit of Work).

This is also for refs, but this could generalize into any SQL-triggering operation, i.e. if I trigger a collection load on client1.getAddresses() and client2.getAddresses() (or what not, using the Collection API), instead of two SQLs for from addresses where client_id = 1 and ...where client_id = 2, a dataloader-aware Collection implementation would do this as where client_id in (1, 2) for me automatically.

In a way, this feature request is really a follow-up from my mini-rant on “preloads are not useful”, insofar as 1) given I’m personally implementing a GQL server where it’s really hard to know preloads, but I still need N+1 avoidance, I need dataloader-style batching loading, and 2) even though my case sounds “special” (oh, this is great for GQL servers), I assert its actually a generalized pattern, and could replace preloads all together, even for non-GQL endpoints/logic that “knows” (or think it knows, until the code changes 😃) the right preloads to use.

I.e. something like:

const project = loadProject(1);
for (const client in await project.clients.loadAll()) {
  for (const address in client.addresses.loadAll()) {
    /// do something with address
  }
}

Could be 3 SQLs of select from project where id = 1, select from client where project_id in (...the project client ids...), and select from address where client_id in (...the id of every client of the project...).

As a disclaimer, I know this is a big ask, and a large departure from the norm of “hint-driven preloads”, but ironically the same “gotcha” of Node/JS ORMs suffering from “sorry, your object graph traversal of project.getClients() has to be a promise that you await” ends up providing the critical infrastructure (“we have to make a bunch of promises and then wait until the end of the event loop to do any I/O”) needed for this sort of auto-batching/auto-flushing approach (“oh you want to do 10 separate I/O operations I’ll combine those for you into 1 batch call”).

Which, again ironically, I think in general is just pretty amazing b/c its framework-driven N+1 avoidance vs. programmer-driven N+1 avoidance, such that other languages/frameworks/ORMs have tried to provide the same thing, but without the Node/JS event loop trick, they end up generally being pretty complex and annoying i.e. https://github.com/facebook/Haxl or https://github.com/47deg/fetch.

Issue Analytics

State:
Created 4 years ago
Reactions:18
Comments:15 (5 by maintainers)

Top GitHub Comments

9reactions

B4nancommented, Dec 11, 2019

Wow, thanks for the detailed description. I must say I was recently thinking about something similar (as nextras orm, one of my inspirations, is actually doing the same afaik), although I did not know that it is called dataloader and it actually already exists :] I will need to spend more time to read this in more detail and go through the links, but generally I like the idea to support this in the ORM natively (maybe optionally for the beginning).

5reactions

iammathewcommented, Oct 23, 2020

Native support for dataloader pattern would be really neat, was looking into mikroorm as a TypeORM alternative and already blown away by some of the feature, but this would be the cherry on top 😃