question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Package locking is crazy slow for scikit-learn

See original GitHub issue

I’m sorry I don’t have a reduced test case for this but this is so crazy slow it’s hard to actually debug.

Steps to reproduce

  1. Clone https://github.com/HearthSim/hearthsim-vagrant
  2. Inside that directory, clone https://github.com/HearthSim/HSReplay.net
  3. Run docker-compose build django (this builds an image based on the python-stretch docker image, which will also install the latest pipenv systemwide, cf. Dockerfile).
  4. Finally, run docker-compose run django, which runs pipenv install --dev

On linux, this stays stuck at Locking [packages] for over 15 minutes, with no output even when run with --verbose. Then after ~15 mins, it gives me the full output of what it’s been doing for all that time. When run outside of docker, it still takes a couple of minutes on that step, but at most 1-2 mins. I have a pretty beefy CPU and SSD, so I don’t know why it would take this long in the first place.

I also see a lot of Warning: Error generating hash for ... in the verbose output, I don’t know if that’s related.

Any idea? How can I debug this further?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:10
  • Comments:43 (35 by maintainers)

github_iconTop GitHub Comments

47reactions
kennethreitzcommented, Mar 19, 2018

you shouldn’t be running lock in docker in the first place…

10reactions
jleclanchecommented, Mar 20, 2018

Cheers @jleclanche I used to play hearthstone in its early days before there was much tooling around it so thanks for building some community tools!

😃

That’s the whole purpose here— you already have an isolated python environment in a virtualenv which is managed by pipenv, so it can handle dependency graphs across platforms, os and whatnot. S

I fully understand what pipenv brings to the table. Just to explain why I’m using it in docker:

  • I need docker because the stack I’m running locally is complex. It’s not just a single python app, it’s a python app, database, mock servers, redis server, etc. All these need to be available, cross-platform, consistently between all devs on the team. Docker solves that.
  • I need (want) pipenv because I need (want) to track my dependencies in a pipfile, rather than requirements.txt. That is to say, I’m moving the app to pipenv anyway. So now my choice is to either duplicate the dependencies, or use pipenv consistently in docker as well.

With that said, I’m not interested in solving my problem. I solved my problem by adding --skip-lock. I’m interested in solving, or helping solve, the egregious difference in performance between inside and outside of the container. Or at least coming out of this with a “there’s a very good reason for this difference and here it is”.

But yarn is also running inside that same container and managing 1-2 orders of magnitude more dependencies than pipenv, so I think we can do better. And if that takes me PRing setup.py/setup.cfg fixes to 30 different projects so be it 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Developers - Package locking is crazy slow for scikit-learn -
Run docker-compose build django (this builds an image based on the python-stretch docker image, which will also install the latest pipenv systemwide, cf....
Read more >
Install scikit-learn for python3 very slow - Stack Overflow
I am trying to install scikit-learn to be used in python3 on Jetson xaiver with ubuntu. When I ran: sudo pip3 install scikit-lear....
Read more >
Are your Python programs running slow? Here's how you can ...
We all know that Python is much slower than statically-typed programming languages like C, C++, Java and some dynamic languages too like ...
Read more >
Intel Extension for Scikit-Learn - Hacker News
The lock-in is an important consideration, but if the scikit-learn API is fully respected it would seem less relevant.
Read more >
If Python is really slow, why are there many companies that ...
Specifically, Python is the gateway to one of the largest ecosystems of machine learning frameworks and distributed compute platforms. These systems are ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found