question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

thread.error: can't start new thread on UnivaGridEngine

See original GitHub issue

Toil failed when running on UnivaGridEngine cluster environment.

Environments

Red Hat Enterprise Linux release 6.9 (Santiago)

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    1
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

Error details

When I run the quick start on the official document with python helloWorld.py bucket, it returned an error.

Traceback (most recent call last):
  File "sort.py", line 242, in <module>
    main()
  File "sort.py", line 236, in main
    memory=sortMemory))
  File "/home/6br/anaconda3/envs/toil/lib/python2.7/site-packages/toil/common.py", line 762, in start
    self._batchSystem = self.createBatchSystem(self.config)
  File "/home/6br/anaconda3/envs/toil/lib/python2.7/site-packages/toil/common.py", line 911, in createBatchSystem
    return batchSystemClass(**kwargs)
  File "/home/6br/anaconda3/envs/toil/lib/python2.7/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 283, in __init__
    super(AbstractGridEngineBatchSystem, self).__init__(config, maxCores, maxMemory, maxDisk)
  File "/home/6br/anaconda3/envs/toil/lib/python2.7/site-packages/toil/batchSystems/abstractBatchSystem.py", line 310, in __init__
    config, config.maxLocalJobs, maxMemory, maxDisk)
  File "/home/6br/anaconda3/envs/toil/lib/python2.7/site-packages/toil/batchSystems/singleMachine.py", line 134, in __init__
    worker.start()
  File "/home/6br/anaconda3/envs/toil/lib/python2.7/threading.py", line 736, in start
    _start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread

Usually, shared cluster machine limits the maximum memory. Toli forked worker threads until the number of threads reaches a certain number, which exceeds the memory limitation. So, it could not start a new thread. So, I would like to know why so many worker threads should be forked and how to change the number of worker threads.

The result of top command is following. It exceeds the number of core on our machine(24).

 9585 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:02.07 └─ python sort.py bucket3
 9689 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9688 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9687 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9686 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9685 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9684 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9683 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9682 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9681 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9680 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9679 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9678 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9677 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9676 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9675 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9674 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9673 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9672 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9671 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9670 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9669 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9668 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9667 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9666 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9665 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9664 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9663 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9662 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9661 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9660 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9659 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9658 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9657 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3
 9656 6br       20   0 5426M 42328  4836 S  0.0  0.0  0:00.00    β”œβ”€ python sort.py bucket3

For now, I could run toil with --maxCores 1,(it still remained to fork 12 worker threads and it is not fundamental solution.)

--gridOption gridEngine ignores maxCores option

When I run python helloWorld.py bucket --batchSystem gridEngine --maxCores 1, the same error thread.error: can't start new thread occurred. Seemingly, whether or not I use the maxCores option, worker threads are forked until the number reaches the maximum threads. So, it causes to raise thread.error: can't start new thread.

Currently, I use toil with --debugWorker and I succeed to run jobs on the UGE. But it is unnatural.

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-319

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
bencvdbcommented, Aug 21, 2018

@6br PR #2347 should fix this issue if you specify --maxLocalJobs as needed instead of --maxCores. Let me know if this does not fix the problem.

1reaction
DailyDreamingcommented, Aug 20, 2018

This looks similar to #2323 where --maxLocalJobs isn’t doing what it ought to. Should have a fix soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

error: can't start new thread - Stack Overflow
The "can't start new thread" error almost certainly due to the fact that you have already have too many threads running within your...
Read more >
Bug #1727653 β€œerror: can't start new thread” - Launchpad Bugs
Failed with an unknown error. Traceback (most recent call last): File "/usr/bin/duplicity", line 1546, in <module> with_tempdir(main)
Read more >
PYTHON : error: can't start new thread - YouTube
PYTHON : error : can't start new thread [ Gift : Animated Search Engine : https://www.hows.tech/p/recommended.html ] PYTHON : error : can't ......
Read more >
RuntimeError: can't start new thread - Raspberry Pi Forums
Code: Select all import threading import gpiozero as gz import time import psutil i = 0 while True: with gz.DistanceSensor(echo=26Β ...
Read more >
_thread β€” Low-level threading API β€” Python 3.11.1 ...
Start a new thread and return its identifier. ... This does not emit the corresponding signal but schedules a call to the associated...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found