question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PartialTestResult.join_results(result, pickle.load(input)) EOFError libgomp: Thread creation failed: Resource temporarily unavailable

See original GitHub issue

I’m experimenting with parallel execution of tests in the Fedora buildsystem (building Cython 0.29.3).

Up until now the test were disabled because of #1982 but I have decided to enable them and only skip the failing tests on Big Endian. The test are quite slow so I’ve decided to use -j$(nproc) equivalent to speed it up.

The number of CPUs is however quite arbitrary and differs with each build. A i686 builder that was picked god 48 CPUs, so it used -j48 and failed.

We run python2 tests before python3 tests, so this is where I got a strange error. Let me know if I shall reverse the order to see if this happens on Python 3 as well.

$ /usr/bin/python2 runtests.py -vv -j48
...
======================================================================
ERROR: runTest (__main__.CythonRunTestCase)
compiling (c) and running parallel
----------------------------------------------------------------------
Traceback (most recent call last):
  File "runtests.py", line 1266, in run
    self.run_tests(result, ext_so_path)
  File "runtests.py", line 1284, in run_tests
    self.run_doctests(self.module, result, ext_so_path)
  File "runtests.py", line 1296, in run_doctests
    run_forked_test(result, run_test, self.shortDescription(), self.fork)
  File "runtests.py", line 1362, in run_forked_test
    PartialTestResult.join_results(result, pickle.load(input))
EOFError
======================================================================
ERROR: runTest (__main__.CythonRunTestCase)
compiling (cpp) and running parallel
----------------------------------------------------------------------
Traceback (most recent call last):
  File "runtests.py", line 1266, in run
    self.run_tests(result, ext_so_path)
  File "runtests.py", line 1284, in run_tests
    self.run_doctests(self.module, result, ext_so_path)
  File "runtests.py", line 1296, in run_doctests
    run_forked_test(result, run_test, self.shortDescription(), self.fork)
  File "runtests.py", line 1362, in run_forked_test
    PartialTestResult.join_results(result, pickle.load(input))
EOFError
----------------------------------------------------------------------
Ran 179 tests in 147.144s
FAILED (errors=2)

Full log: build.log

This error did not occur another time when the builder had just 6 CPUs and -j6 was used.

I’ve tried to limit the number to 16, however I got the same error with -j16 on a 18 core builder.

Currently I’m experimenting with -j7 (inspired by your Travis CI config) and I will report back.

I’ve only experienced this on i686, yet this was the only builder that I got with 48 CPUs this time. A x86_64 build with -j16 have made it without the error, however the error might not be deterministic.

My very wild guess is that with massive parallelism, the IO is not so fast and something reads a pickle jar too soon.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:17 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
scodercommented, Jan 19, 2019

According to the log (thanks for providing the full output), the test is failing with this error: BUILDSTDERR: libgomp: Thread creation failed: Resource temporarily unavailable

That suggests that OpenMP fails to start its threads for some reason. I would recommend reducing the number of processes relative to the number of cores, since the test runner will also fork out the test runs and some tests will start threads or further subprocesses.

This page also suggests that passing OMP_NESTED=FALSE might limit the overall number of threads, but I can’t say if that breaks any of the tests as they might depend on starting new ones (don’t know).

0reactions
hroncokcommented, Jan 21, 2019

That explains a lot. I thought that “parallel” here is about the -j thing and that why it acted like a red flag from me. Using -x run.parallel gets the job done.

Read more comments on GitHub >

github_iconTop Results From Across the Web

OpenMP: "libgomp: Thread creation failed: Resource ...
example I get the error: libgomp: Thread creation failed: Resource temporarily unavailable However, when I run this same code and command ...
Read more >
Understanding the Libgomp: Thread Creation Failed Error
However, this may result in an error on InMotion Hosting servers: libgomp: thread creation failed; resource temporarily unavailable.
Read more >
How to Fix the Libgomp: Thread Creation Failed Error
libgomp : Thread creation failed: Resource temporarily unavailable, referer:/wp-admin/media-new.php. This error occurs due to a configuration ...
Read more >
Thread creation failed: Resource temporarily unavailable
libgomp : Thread creation failed: Resource temporarily unavailable. Please advise. Any guidance would be much appreciated. Regards,. Vidhu.
Read more >
execution error -- libgomp: Thread creation failed: Resource ...
execution error -- libgomp: Thread creation failed: Resource temporarily unavailable ... Greetings, Recently I managed to modify the source code to compute ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found