Parallel class does not use temporary directory
See original GitHub issueI do not have /dev/shm on the instance I’m using (an AWS lambda) and therefore, I thought joblib would be an appropriate solution to allow me to use a different directory. However even after specifying: temp_folder='/tmp', I’m getting the following error message:
'NoneType' object has no attribute 'current_process': AttributeError
Traceback (most recent call last):
File "/var/task/world.py", line 74, in handler
r = Parallel(n_jobs=1, verbose=9, temp_folder=temp_folder, backend = 'multiprocessing')(delayed(worker)(i, user_vars) for i in range(user_vars['worlds']))
File "/var/task/deps/joblib/parallel.py", line 728, in __call__
n_jobs = self._initialize_backend()
File "/var/task/deps/joblib/parallel.py", line 540, in _initialize_backend
**self._backend_args)
File "/var/task/deps/joblib/_parallel_backends.py", line 288, in configure
n_jobs = self.effective_n_jobs(n_jobs)
File "/var/task/deps/joblib/_parallel_backends.py", line 268, in effective_n_jobs
if mp.current_process().daemon:
AttributeError: 'NoneType' object has no attribute 'current_process'
mp imports from _multiprocessing_helpers and I must fail at line 27. I dont think I can create an mp.Semaphore()
[Errno 38] Function not implemented: OSError
Traceback (most recent call last):
File "/var/task/world.py", line 74, in handler
mp.Semaphore()
File "/usr/lib64/python2.7/multiprocessing/__init__.py", line 197, in Semaphore
return Semaphore(value)
File "/usr/lib64/python2.7/multiprocessing/synchronize.py", line 111, in __init__
SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX)
File "/usr/lib64/python2.7/multiprocessing/synchronize.py", line 75, in __init__
sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 38] Function not implemented
Using backend='threading' works just find, so this is a multiprocessing issue. How can I get around this? Should I be using an environmental variable to define the directory?
Issue Analytics
- State:
- Created 7 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Parallel processing and temporary files - Stack Overflow
I believe multicore spins off a separate process for each subtask. If that assumption is correct, then you should be able to use...
Read more >Using Temporary Directories - Crash Course By Assaf Gordon
Create one temporary directory, then all file names can be fixed: there would no collisions if the same script is run multiple times...
Read more >Temporary Directories - UFRC Help and Documentation
When a SLURM job starts, the scheduler creates a temporary directory for the job on the compute node's local hard drive.
Read more >Removing Temp Files from .WORK-parallel Directory
"Hi, I need to remove leftover temporary directory/files within data partitions. When I use m_rm -rf I'm getting directory n.
Read more >Injecting Temporary Directories - JUnit Pioneer
A temporary directory is a directory on the machine's on-disk filesystem that is created for one or more tests and deleted when it...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

If anyone is still facing this issue, updating sklearn will resolve this. installing scikit-learn using conda version 4.3.14, put a line at 268 of checking mp is none then return 1. which resolve this issue.
Closing this one, there is nothing joblib can do about this.