question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segmentation fault on pairwise_distances with n_jobs=-1

See original GitHub issue

Describe the bug

Calculating pairwise distances fails when job is executed in multi-thread setting. Tested only with n_jobs -1 (unsuccesful) and 1 (succesful).

Hardware doesn’t seem to be an issue, since the single-thread execution worked. Out of context, we are talking about a very capable machine (40 core Xeon, 380GB DDR4 RAM, 4TB SSD storage).

Steps/Code to Reproduce

    distances = pairwise_distances(
        target_embeddings,
        reference_embeddings,
        metric=result_kwargs['metric'],
        n_jobs=result_kwargs['n_jobs']
    )

having result_kwargs['n_jobs'] set to -1 will cause the segmentation fault. Setting result_kwargs['n_jobs'] to 1 resulted in a successful ecxecution.

In this case target_embeddings is an np.array of float32 of shape 192656x1024, while reference_embeddings is an np.array of float32 of shape 34333x1024 . I cannot provide the original datapoints, but random np.arrays should suffice for testing purposes.

Expected Results

No segmentation fault, independent of multi-threading.

Actual Results

Segmentation fault with multithreading.

Versions

>>> import sklearn; sklearn.show_versions()

System:
    python: 3.8.5 (default, Aug  5 2020, 08:36:46)  [GCC 7.3.0]
executable: /..../bin/python
   machine: Linux-4.15.0-112-generic-x86_64-with-glibc2.10

Python dependencies:
       pip: 20.2.2
setuptools: 49.6.0.post20200814
   sklearn: 0.22.2.post1
     numpy: 1.19.1
     scipy: 1.5.2
    Cython: None
    pandas: 1.1.1
matplotlib: 3.3.1
    joblib: 0.16.0

Built with OpenMP: True

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, Sep 7, 2020

Please specify the metric

0reactions
lorentzenchrcommented, Feb 17, 2022

Without further details like the traceback of the segfault, I guess we can close.

Read more comments on GitHub >

github_iconTop Results From Across the Web

"Segmentation fault" in indexing jobs (and other jobs which ...
Non-indexing jobs (such as auth-14) which build cycles files also have the problem. Cause: b_cycle_table_handle.c program needs to be recompiled ...
Read more >
Segmentation fault in nested for loop with dynamic memory ...
Initially I thought it'd be a problem related to dynamic memory allocation, but I do not know the exact cause. I've checked the...
Read more >
Identify what's causing segmentation faults (segfaults)
A segmentation fault (aka segfault) is a common condition that causes programs to crash; they are often associated with a file named core...
Read more >
sklearn.metrics.pairwise_distances
If Y is given (default is None), then the returned matrix is the pairwise distance between the arrays from both X and Y....
Read more >
Segmentation fault: 11 with allow(ED)-extra-chr data.
plink --file MisterT --indep-pairwise 50 10 0.1 --allow-extra-chr ... to obtain a .bed but with a segmentation fault : 9 when I try...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found