fetch_openml can raise "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process"
See original GitHub issueDescribe the bug
On windows, if fetch_openml
is run concurrently in 2 processes, for instance when running the test with pytest-xdist, one sometimes get errors such as:
[...]
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x1439D0F0>
gzip_response = True
@pytest.mark.parametrize("gzip_response", [True, False])
version = 'active'
C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\sklearn\datasets\_openml.py:449: in _get_data_description_by_id
url, error_message, data_home=data_home
data_home = 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml'
data_id = 2
error_message = 'Dataset with data_id 2 not found.'
url = 'api/v1/json/data/2'
C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\sklearn\datasets\_openml.py:172: in _get_json_content_from_openml_api
return _load_json()
_load_json = <function _get_json_content_from_openml_api.<locals>._load_json at 0x14167C00>
data_home = 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml'
error_message = 'Dataset with data_id 2 not found.'
url = 'api/v1/json/data/2'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (), kw = {}
local_path = 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml\\openml.org\\api/v1/json/data/2.gz'
@wraps(f)
def wrapper(*args, **kw):
if data_home is None:
return f(*args, **kw)
try:
return f(*args, **kw)
except HTTPError:
raise
except Exception:
warn("Invalid cache, redownloading file", RuntimeWarning)
local_path = _get_local_path(openml_path, data_home)
if os.path.exists(local_path):
> os.unlink(local_path)
E PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\VssAdministrator\\scikit_learn_data\\openml\\openml.org\\api/v1/json/data/2.gz'
Full error log:
Steps/Code to Reproduce
Run pytest -x -n 4 --pyargs sklearn
many times.
Expected Results
No crash, the fetch_openml
should be concurrent safe.
Actual Results
See error report above.
Versions
Python dependencies:
pip: 21.3.1
setuptools: 47.1.0
sklearn: 1.1.dev0
numpy: 1.21.4
scipy: 1.7.3
Cython: 0.29.24
pandas: None
matplotlib: None
joblib: 1.1.0
threadpoolctl: 3.0.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\numpy\.libs\libopenblas.VTYUM5MXKVFE4PZZER3L7PNO6YB4XFF3.gfortran-win32.dll
version: 0.3.17
threading_layer: pthreads
architecture: Nehalem
num_threads: 2
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\hostedtoolcache\windows\Python\3.7.9\x86\lib\site-packages\scipy\.libs\libopenblas.VTYUM5MXKVFE4PZZER3L7PNO6YB4XFF3.gfortran-win32.dll
version: 0.3.17
threading_layer: pthreads
architecture: Nehalem
num_threads: 2
user_api: openmp
internal_api: openmp
prefix: vcomp
filepath: C:\Windows\SYSTEM32\VCOMP140.DLL
version: None
num_threads: 2
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
[WinError 32] The process cannot access the file because it is ...
Your process is the one that has the file open (via im still existing). You need to close it first before deleting it....
Read more >[winerror 32] the process cannot access the file because it is ...
Summary. Hey. I get the error "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process:...
Read more >PermissionError: [WinError 32] The process cannot access
PermissionError : [WinError 32] The process cannot access the file because it is being used by another process.
Read more >[WinError 32] The process cannot access the file because it is ...
PermissionError : [WinError 32] The process cannot access the file because it is being used by another process. What steps will reproduce the...
Read more >[WinError 32] The process cannot access the file because it is ...
[Solved]-Getting Python error -->PermissionError: [WinError 32] The process cannot access the file because it is being used by another process-Pandas,Python.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@siavrez Python has
tempfile
builtin 😃I’d be in favor of making
fetch_*
multiprocess safe.We have introduced some complexity in our test code for
fetch_*
functions that is notfetch_openml
:https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/conftest.py#L81-L82
This code downloads all the necessary files before
pytest-xdist
distributes the work. To me it still feels like a workaround to get tests to work withpytest-xdist
.We have to choose what we want to be threadsafe and I would prefer to have
fetch_*
be threadsafe.Given all that, I think it is important to fix the tests so the CI is stable. I opened https://github.com/scikit-learn/scikit-learn/pull/21806 as a quick workaround to fix the tests.