question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature] Automatic `dedupe` on `install`

See original GitHub issue
  • I’d be willing to implement this feature (contributing guide)
  • This feature is important to have in this repository; a contrib plugin wouldn’t do

Describe the user story

As a project maintainer, I want all repo dependencies to be deduplicated at all times without having to call yarn dedupe.

Describe the solution you’d like

## .yarnrc.yml
dedupeStrategy: "latest"

(It can also be "fewer" instead of "latest" when https://github.com/yarnpkg/berry/issues/2297 is implemented)

If dedupeStrategy is defined, yarn dedupe is automatically called as part of yarn install, yarn add etc. Running yarn dedupe becomes a noop.

Describe the drawbacks of your solution

  • + 1 option in docs and config schema
  • potentially harmful package deduplication which leads to bugs (I haven’t encountered this in years while using Yarn 1 and yarn-deduplicate)

Describe alternatives you’ve considered

Alternative 1

Repo example: kachkaev/njt 🐸

I have these two scripts in package.json:

"fix:yarn-dedupe": "yarn dedupe",
"lint:yarn-dedupe": "yarn dedupe --check",

The latter one is called in .github/workflows/ci.yml, thus making sure that yarn.lock is deduplicated at all times.

If someone forgets to run yarn fix:dedupe locally, CI fails with this custom message:

ℹ️ ℹ️ ℹ️
Some dependencies can be deduplicated, which will make yarn.lock
lighter and potentially save us from unexplainable bugs.
Please run `yarn fix:yarn-dedupe` locally and commit yarn.lock.
ℹ️ ℹ️ ℹ️

A contributor needs to commit and push to resolve the issue, which is a pretty mechanical task. I need to maintain two additional package.json scripts and one additional CI task in all my repos.


Alternative 2

Back in Yarn 1 days, I experimented with running yarn-deduplicate as a pre-commit hook via husky. This led to slow git commit calls and some of my teammates were confused.


Alternative 3

When using Dependabot for automatic dependency updates, I saw CI failures for some of its PRs. Because Dependabot cannot be configured with custom post-run commands, this problem can only be solved with manual pushes or a custom CI pipeline that tracks new PRs. That’s quite tedious to setup.

Renovate supports postUpdateOptions and yarnDedupeHighest is available for Yarn 2+. Dependabot maintainers may also come up with something (https://github.com/dependabot/dependabot-core/issues/5830), but they don’t want to introduce a new configuration option which makes sense. Until then, Renovate is a necessary alternative to Dependabot.

Allowing dedupeStrategy: "latest" in .yarnrc.yml can simplify Dependabot / Renovate logic and also align these bots with local yarn install calls. It seems to me that this new declarative approach has more pros than cons, but I might be missing something.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:7
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
kachkaevcommented, Oct 31, 2022

Thanks @ambar! A plugin can definitely help in some scenarios but unfortunately it does not cover the whole problem space. Dependabot (or other hosted tools) need to be able to deduplicate dependencies without having to run repo code because doing so is unsafe. Any solution except dedupeStrategy: "latest" in .yarnrc.yml will require third-party hacks. In the end, the approach to deduplicating may become fragmented within the community.

2reactions
kachkaevcommented, Oct 19, 2022

Yeah that could work locally, but I’m not sure if Dependabot will be able to use the plugin. They don’t run custom repo code for security reasons. A locally installed plugin would fall into this category.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dedupe.io
dedupe is a library that uses machine learning to perform de-duplication and ... and automatically find similar records, even with very large databases....
Read more >
`yarn dedupe` | Yarn - Package Manager
Deduplicate dependencies with overlapping ranges.
Read more >
npm-dedupe
To prefer deduplication over novelty during the installation process, run npm install --prefer-dedupe or npm ... This eliminates all automatic deduping.
Read more >
Data Deduplication Overview - Microsoft Learn
Data Deduplication, often called Dedup for short, is a feature that can help reduce the impact of redundant data on storage costs.
Read more >
How to Save Disk Space using Btrfs Deduplication - Linux Hint
Deduplication is a software feature that is used to remove duplicate data blocks from a filesystem to save disk spaces. The Btrfs filesystem...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found