question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segmentation fault in libiomp with pytorch / regression from 0.24.2

See original GitHub issue

Describe the bug

When running a simple program to fit some values, python crashes inside libiomp, if and only if torch is imported before scikit-learn and version 1.0 of 'scikit-learn` is used.

Steps/Code to Reproduce

from random import randint
# Comment the next line to avoid the segfault
import torch
from sklearn.cluster import KMeans

X = [[randint(0, j) for j in range(1000)] for i in range(1000)]

kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
print(kmeans.labels_)

Expected Results

The sample code shouldn’t crash if torch is imported before scikit-learn and print the labels as shown below (import torch has been removed to get this result):

(.venv) (base) ➜  test_seg_fault $ python main.py                                                                                    
[0 2 0 1 3 1 3 0 2 0 0 0 1 1 2 0 1 1 1 0 2 1 2 0 3 0 3 3 1 3 0 1 0 1 3 1 1
 2 0 3 1 0 3 2 0 0 2 0 3 3 0 2 1 0 2 1 0 3 0 3 1 0 1 1 0 0 0 1 3 1 1 0 2 0
 3 3 1 2 0 2 3 3 2 0 2 1 3 2 0 2 1 3 1 1 3 0 3 3 0 0 0 2 0 1 3 3 0 0 2 0 1
 0 3 2 3 1 3 0 3 1 2 3 1 3 0 3 3 0 1 3 1 2 1 3 3 1 1 2 3 2 1 0 3 3 3 0 1 1
 1 3 0 3 2 0 2 3 2 2 2 3 3 0 0 0 0 2 3 2 3 0 0 3 2 2 1 0 0 3 0 2 1 3 3 0 3
 0 1 0 1 3 3 1 1 0 1 2 3 1 1 3 0 0 0 3 3 0 2 0 1 3 1 0 2 3 0 1 3 3 2 1 1 0
 0 1 3 0 0 1 2 1 2 3 2 1 1 3 2 1 0 0 0 0 3 0 0 0 0 2 3 2 3 2 2 0 3 0 2 0 2
 1 0 1 3 0 2 3 0 1 1 0 0 2 2 1 3 0 0 1 3 3 0 0 2 3 1 0 0 1 1 2 1 0 3 1 0 1
 1 0 1 2 0 2 1 3 0 3 0 0 1 1 0 0 0 3 3 1 3 0 3 1 3 3 0 3 1 1 1 3 1 0 2 0 1
 2 0 0 2 0 1 1 1 3 1 3 2 1 1 3 3 2 3 2 3 0 1 3 3 1 0 3 1 1 1 3 3 2 3 2 1 2
 0 3 3 3 3 0 1 1 3 3 3 0 2 2 3 2 2 0 3 0 3 2 3 3 0 0 3 0 0 2 1 1 0 2 0 2 0
 2 2 0 1 3 0 0 2 0 0 1 3 3 0 0 0 1 3 2 3 0 2 2 0 1 0 3 3 2 0 2 3 0 1 1 0 3
 0 2 0 3 1 3 2 3 1 0 2 2 1 0 3 1 2 3 0 2 2 2 1 3 3 3 3 0 0 3 0 3 2 2 3 0 3
 1 3 2 3 1 2 3 1 3 3 0 2 0 0 3 2 2 2 1 1 2 0 1 3 3 2 1 1 3 0 0 1 2 0 1 2 0
 0 3 1 0 0 1 1 1 3 2 1 0 1 2 3 0 0 3 1 0 0 2 3 3 1 2 3 1 2 3 2 3 3 3 3 2 2
 1 0 2 1 2 1 1 1 1 3 3 2 3 1 3 0 0 1 3 0 1 2 0 1 0 2 2 3 3 0 0 0 2 3 1 2 2
 1 2 1 3 0 0 0 3 1 3 0 2 0 2 3 3 1 2 1 3 0 0 2 3 1 1 0 1 0 0 3 1 1 1 0 2 1
 1 3 1 0 0 0 0 1 0 3 0 0 0 3 2 1 3 0 3 3 1 3 2 0 0 3 3 2 0 0 2 0 1 3 0 0 0
 1 3 3 2 2 1 3 3 2 3 2 0 1 0 0 3 2 0 1 0 1 3 2 1 0 3 3 3 1 2 1 2 1 3 0 0 0
 3 3 0 1 2 3 0 3 2 2 1 3 0 3 3 1 1 1 0 0 1 0 0 0 0 3 3 1 0 3 3 3 2 0 0 2 2
 0 1 0 3 2 1 3 0 1 1 2 1 0 1 3 1 0 0 1 3 1 1 3 1 0 3 2 1 2 1 0 2 2 2 2 2 0
 1 0 3 0 0 0 3 0 3 1 1 0 0 0 2 3 1 1 0 2 3 3 3 3 2 1 0 2 0 3 3 3 1 0 0 1 0
 0 1 2 2 3 2 2 3 1 1 1 1 3 1 3 1 0 1 3 2 0 0 0 1 2 3 1 2 0 3 1 3 2 0 3 3 0
 0 2 1 2 0 1 1 0 3 3 3 2 3 1 3 0 0 1 1 2 3 1 1 1 3 2 3 3 0 1 0 0 2 1 1 1 1
 3 3 3 3 1 3 1 3 1 3 1 0 3 1 0 1 1 1 1 1 2 2 0 3 3 0 0 3 1 0 1 3 0 3 2 3 3
 1 2 1 3 0 0 2 3 1 1 2 1 0 3 0 3 1 3 3 2 2 2 1 2 0 1 3 1 3 0 3 1 3 3 1 1 0
 1 3 0 2 1 2 3 2 1 0 1 3 2 3 1 2 0 3 1 3 1 3 2 3 0 2 0 3 3 2 3 1 1 0 3 3 0
 1]

Actual Results

The program fails with a segmentation fault. I managed to extract the stack trace with lldb (on mac OS):

(.venv) (base) ➜  test_seg_fault $ lldb ./.venv/bin/python
(lldb) target create "./.venv/bin/python"
Current executable set to './.venv/bin/python' (x86_64).
(lldb) run main.py
Process 70127 launched: './.venv/bin/python' (x86_64)
Process 70127 stopped
* thread #2, stop reason = exec
    frame #0: 0x000000010000e000 dyld`_dyld_start
dyld`_dyld_start:
->  0x10000e000 <+0>: popq   %rdi
    0x10000e001 <+1>: pushq  $0x0
    0x10000e003 <+3>: movq   %rsp, %rbp
    0x10000e006 <+6>: andq   $-0x10, %rsp
Target 0: (Python) stopped.
(lldb) continue
Process 70127 resuming
Process 70127 stopped
* thread #17, stop reason = EXC_BAD_ACCESS (code=1, address=0x48)
    frame #0: 0x000000013012aa6c libiomp5.dylib`void __kmp_suspend_64<false, true>(int, kmp_flag_64<false, true>*) + 28
libiomp5.dylib`__kmp_suspend_64<false, true>:
->  0x13012aa6c <+28>: movq   (%rdx,%rdi,8), %r13
    0x13012aa70 <+32>: movq   %r13, %rdi
    0x13012aa73 <+35>: callq  0x13011d570               ; __kmp_suspend_initialize_thread
    0x13012aa78 <+40>: leaq   0x5c0(%r13), %r14
  thread #18, stop reason = EXC_BAD_ACCESS (code=1, address=0x50)
    frame #0: 0x000000013012aa6c libiomp5.dylib`void __kmp_suspend_64<false, true>(int, kmp_flag_64<false, true>*) + 28
libiomp5.dylib`__kmp_suspend_64<false, true>:
->  0x13012aa6c <+28>: movq   (%rdx,%rdi,8), %r13
    0x13012aa70 <+32>: movq   %r13, %rdi
    0x13012aa73 <+35>: callq  0x13011d570               ; __kmp_suspend_initialize_thread
    0x13012aa78 <+40>: leaq   0x5c0(%r13), %r14
  thread #19, stop reason = EXC_BAD_ACCESS (code=1, address=0x58)
    frame #0: 0x000000013012aa6c libiomp5.dylib`void __kmp_suspend_64<false, true>(int, kmp_flag_64<false, true>*) + 28
libiomp5.dylib`__kmp_suspend_64<false, true>:
->  0x13012aa6c <+28>: movq   (%rdx,%rdi,8), %r13
    0x13012aa70 <+32>: movq   %r13, %rdi
    0x13012aa73 <+35>: callq  0x13011d570               ; __kmp_suspend_initialize_thread
    0x13012aa78 <+40>: leaq   0x5c0(%r13), %r14
  thread #20, stop reason = EXC_BAD_ACCESS (code=1, address=0x60)
    frame #0: 0x000000013012aa6c libiomp5.dylib`void __kmp_suspend_64<false, true>(int, kmp_flag_64<false, true>*) + 28
libiomp5.dylib`__kmp_suspend_64<false, true>:
->  0x13012aa6c <+28>: movq   (%rdx,%rdi,8), %r13
    0x13012aa70 <+32>: movq   %r13, %rdi
    0x13012aa73 <+35>: callq  0x13011d570               ; __kmp_suspend_initialize_thread
    0x13012aa78 <+40>: leaq   0x5c0(%r13), %r14
Target 0: (Python) stopped.
(lldb) 

Versions

>>> import sklearn; sklearn.show_versions()

System:
    python: 3.9.7 (default, Sep  3 2021, 12:37:55)  [Clang 12.0.5 (clang-1205.0.22.9)]
executable: /Users/abderraouf.elgasser/projects/iktos/experiments/test_seg_fault_ndevaux/.venv/bin/python
   machine: macOS-11.4-x86_64-i386-64bit

Python dependencies:
          pip: 21.2.4
   setuptools: 57.4.0
      sklearn: 1.0
        numpy: 1.21.1
        scipy: 1.6.1
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True

Tested with Python 3.7.9, 3.7.12 and 3.9.7, torch 1.7.0 and 1.9.1

It doesn’t crash in any of these environments with version 0.24.2 of scikit-learn

Tested on Mac OS 11.4

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
thomasjpfancommented, Oct 11, 2021

When do you expect to release a version with the fix to pypi (We use poetry to install the dependencies)?

The fix will be included in 1.0.1. I do not have a exact timeline on that release, but I suspect it will be soon.

1reaction
thomasjpfancommented, Oct 11, 2021

This can be related to https://github.com/scikit-learn/scikit-learn/issues/21182 where we used libomp 12 to build the osx wheel.

Can you see if you get this error by installing the nightly build:

pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple scikit-learn
Read more comments on GitHub >

github_iconTop Results From Across the Web

Segmentation fault and there are no infomation about this error
Hi, I have some issues which I am not able to solve. A segmentation fault happens when I run this project in brach...
Read more >
Segmentation fault - PyTorch Forums
Hi,ptrblck, I solved this problem. look into this code, and it is a function of a sampler of Dataloader. def iter(self) returns iter(torch....
Read more >
Segmentation Fault bias initialisation Conv2d - PyTorch Forums
Hi!, I face a problem using pytorch 1.3.0 on Cuda V100. Here the code originating from and associated paper ...
Read more >
[solved] Segmentation fault (core dump) - PyTorch Forums
Hi, I'm running into an segmentation fault error while running my code. ... libiomp5.so Try: yum --enablerepo='*debug*' install ...
Read more >
Segmentation Fault when importing PyTorch
When I tried to import PyTorch in python, it crashed with a segfault error: “Segmentation fault (core dumped)” is all I have about...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found