Segmentation fault on pairwise_distances with n_jobs=-1
See original GitHub issueDescribe the bug
Calculating pairwise distances fails when job is executed in multi-thread setting. Tested only with n_jobs
-1 (unsuccesful) and 1 (succesful).
Hardware doesn’t seem to be an issue, since the single-thread execution worked. Out of context, we are talking about a very capable machine (40 core Xeon, 380GB DDR4 RAM, 4TB SSD storage).
Steps/Code to Reproduce
distances = pairwise_distances(
target_embeddings,
reference_embeddings,
metric=result_kwargs['metric'],
n_jobs=result_kwargs['n_jobs']
)
having result_kwargs['n_jobs']
set to -1
will cause the segmentation fault. Setting result_kwargs['n_jobs']
to 1
resulted in a successful ecxecution.
In this case target_embeddings
is an np.array
of float32
of shape 192656x1024, while reference_embeddings
is an np.array
of float32
of shape 34333x1024 . I cannot provide the original datapoints, but random np.arrays should suffice for testing purposes.
Expected Results
No segmentation fault, independent of multi-threading.
Actual Results
Segmentation fault with multithreading.
Versions
>>> import sklearn; sklearn.show_versions()
System:
python: 3.8.5 (default, Aug 5 2020, 08:36:46) [GCC 7.3.0]
executable: /..../bin/python
machine: Linux-4.15.0-112-generic-x86_64-with-glibc2.10
Python dependencies:
pip: 20.2.2
setuptools: 49.6.0.post20200814
sklearn: 0.22.2.post1
numpy: 1.19.1
scipy: 1.5.2
Cython: None
pandas: 1.1.1
matplotlib: 3.3.1
joblib: 0.16.0
Built with OpenMP: True
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Please specify the metric
Without further details like the traceback of the segfault, I guess we can close.