question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

investigate possible checkout performance regression since 0.20.0

See original GitHub issue

tdeboissiere on discord reports that dvc pull(which didn’t download anything, so dvc checkout is the culprit) on 0.20.0 takes 111s, but on 0.20.3 160s. Need to investigate if we have a regression in checkout performance.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
efiopcommented, Dec 12, 2018

Thank you guys for your patience. With add/checkout it turned out, that we are not shortcircuiting log messages early enough in non-verbose modes, causing such delays(up to 50% speedup on a test with a directory with 100K files). This was the cause of regression after 0.20.0, since we’ve added more debug msgs after it. Will release new version ASAP. #1331 is still relevant since we could improve the performance much more. Also, gc issue described by @IamGianluca is also a separate one, so created https://github.com/iterative/dvc/issues/1429 to track that. Actively working on optimizations right now.

1reaction
efiopcommented, Dec 1, 2018

@IamGianluca Whoa, that is a lot of time 🙁 Note that if you remove .dvc/cache, you will loose cache for your pipeline inputs as well, so you won’t be able to reproduce the pipeline as is, you would have to manually place input data back into your workspace. If you know where to find that data, you could indeed rm -rf .dvc/cache for now. Otherwise you need to at least back up that input data somewhere, either by manually coping it, or creating a directory, making it a dvc remote and pushing data there(either dvc push to backup all currently used cache, or dvc push data.dvc for all input data).

I’m investigating right now. Thank you for your patience guys.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spark performance regression in BQSR and HC #4376 - GitHub
I've been looking at 2bit performance today, comparing ADAM release version 0.20.0 to release version 0.23.0 and to git HEAD (0.24.0-SNAPSHOT), ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found