question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow more aggressive transform caching in multiproject monorepos

See original GitHub issue

šŸ› Bug Report

In multiproject configs, each project root gets its own jest transform cache. This leads to duplicate work transforming the same files, even if they use the same configs. If a file is used in n projects, it will be transformed n times.

Locally, this increases cache disk usage. In CI, where the cache will not be warm, it increases runtime as duplicate work is required.

Writing a custom transformer with a simpler cache key implementation does not solve this - each project gets a separate cache folder.

The relevant code that does this is in https://github.com/facebook/jest/blob/c98b22097cb6faa3ed3fabf197cbe4f466620b9f/packages/jest-transform/src/ScriptTransformer.ts#L132-L136 - forces a unique cache path per config.name If unassigned, config.name is assigned to a hash based on the path and index.

I’ve tried adding a common name to all projects’ jest configs. That fixes the transform problem, but breaks other things (manual mocks in an __mocks__ folder don’t work consistently). On our large monorepo, this gave a ~30% improvement in total runtime, but __mocks__ becoming unpredictable

I appreciate that there are edge cases to handle here (potentially different jest configs could warrant a different cache), but I think it should be available for the jest transformer to decide whether this is important (e.g. if relevant, a transformer could include config.name in the cache key manually). e.g. optionally allow transformers to provide their own implementations of getCacheFilePath, which overrides the use of HasteMap.getCacheFilePath( this._config.cacheDirectory, 'jest-transform-cache-' + this._config.name, VERSION, )

If this sort of change would be accepted, I can probably provide a PR.

To Reproduce

Steps to reproduce the behavior:

  • Set up a multiproject config
  • Run tests with a cleared cache
  • Either observe the cache on disk, or tap into the transform to count the number of times a file is transformed

Expected behavior

If the transform config is the same, each file is only transformed once

Link to repl or repo (highly encouraged)

https://github.com/lexanth/jest-projects-repro

This is a monorepo with 3 packages (A, B and C). A and B consume C. the code in C currently gets transformed once per package, even with the transformer (in the jest-preset package) giving a super aggressive cache key implementation (yarn test:ci - could be used e.g. in CI, if we know the other relevant configs are constant).

Adding name: process.env.USE_SIMPLIFIED_CACHE ? '_' : undefined to each package’s jest config makes them all use the same cache, but in my actual repo breaks other things, being a bit of a hack.

Everything is running in band because the tests are so fast that multiple workers all start transforming before another can populate the cache anyway.

envinfo

  System:
    OS: macOS 10.15.7
    CPU: (16) x64 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
  Binaries:
    Node: 12.18.1 - ~/.nvm/versions/node/v12.18.1/bin/node
    Yarn: 1.22.10 - /usr/local/bin/yarn
    npm: 6.14.5 - ~/.nvm/versions/node/v12.18.1/bin/npm
  npmPackages:
    jest: ^26.6.3 => 26.6.3

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
SimenBcommented, Nov 17, 2020

Btw, why are we not using @jest/create-cache-key-function as a default now? Seems like a good case for some reuse šŸ˜„

Dunno. PR welcome? šŸ˜€

1reaction
SimenBcommented, Nov 17, 2020

I don’t understand the ā€œlet’s be smarter about busting in transformersā€ comments - even if the cache key is the same it’s a cache miss since Jest will look in different directories for different projects when checking if the cached file exist. If you by ā€œinstead of relying on one-size-fits-all solutions like stringified config or project nameā€ mean ā€œremove project name from the algorithmā€ that has nothing to do with the transformers themselves. That code lives in @jest/transform. I’m down with just removing that part of it which should solve it as we’ll be trusting the cache key from transformers.

Making getCacheKey of babel-jest ā€œsmarterā€ is orthogonal to this issue (although I agree it should be done) as this issue is about unlocking the ability of transformers to be ā€œsmarterā€ at all - any update we make to getCacheKey would be void since @jest/transform wouldn’t get cache hits regardless

Read more comments on GitHub >

github_iconTop Results From Across the Web

Caching - Turborepo
Caching allows Turborepo to skip work that's already been done, for the fastest builds.
Read more >
Monorepos: Please don't - Hacker News
I work on a project structured into microservices and use both. There is one global repo with submodules in subrepositories. So when someoneĀ ......
Read more >
From a Single Repo, to Multi-Repos, to Monorepo, to Multi ...
I've been working on the same project for several years. Its initial version was a huge monolithic app containing thousands of files.
Read more >
Monorepo Build Tools - Earthly Blog
In this article, I'll compare some of the most popular monorepo build ... tightly track dependencies and aggressively cache build steps,Ā ...
Read more >
Too Much Code for Bazel Monorepo? Try Going Virtual
As a source-dependency build tool with aggressive caching mechanisms, Bazel is optimized for building one big monorepo. The more ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found