question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallel hangs when /dev/shm is too small

See original GitHub issue

Hey guys, I’ve encountered hangs on Parallel usage by sklearn, and one thing I’ve noticed is that hangs are happening when my dataset is big enough. Debugging print showed me that main process is stuck here, so it looks like main process can never get a response from child process. After some code analysis I was able to find usage of /dev/shm in joblib. Output of df showed me that this volume is full, so I gave it more space and after that hangs stopped.

So my suspicion is that joblib doesn’t work with /dev/shmem fully correct, in sense that maybe we should check if there is enough space available and raise an error instead of hangs.

I ran my experiments in Jupyter Notebook, Ubuntu 14.04 in Docker. All libs are latest versions from pip.

If you need any additional info from me, let me know; I will be glad to help.

P.S. There is a bunch of issues with similar symptom here, but I wasn’t sure if they are really related, so I created another one with concrete description.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:2
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
Piatachockcommented, Apr 3, 2017

I’m not sure if this is the case, because by “hang” I meant exactly 0% CPU consumption of all subprocesses for like 3 hours minimum (I turned it off after that). IMHO case with swapping should look diffirently.

1reaction
Piatachockcommented, Apr 6, 2017

@ogrisel yes, I never tried to change temp_folder parameter.

I think that fact of /dev/shm was full is no surprise, because I’ve run Jupyter in Docker container, and it has default size of /dev/shm equal to 64M. After I increased it, hangs stopped.

Why this case (full shared memory) is leading to hangs is interesting, though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Create Large File in /dev/shm in parallel: performance
The fastest way to do this is to just call truncate() or ftruncate() to expand the file to the size you want. You...
Read more >
When should I use /dev/shm/ and when should I use /tmp
In my ubuntu 14.04 /dev/shm is link to /run/shm, that has filesystem "none" according to command df. Size is about 2G, though. @jarno...
Read more >
Training crashes due to - Insufficient shared memory (shm)
Hi all, I am training an image recognition model with dataset size (4M training images 200x200 size) Here are the configurations of the ......
Read more >
Embarrassingly parallel for loops - Joblib - Read the Docs
However if the parallel function really needs to rely on the shared memory ... By default the data is dumped to the /dev/shm...
Read more >
Local temporary file system in memory - NERSC Documentation
Note that /dev/shm is a file system local to each node, so no shared file ... of accesses to small files on the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found