question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

SoX effect "rate" crashing or hanging in multiprocessing

See original GitHub issue

šŸ› Bug

This time Iā€™m pretty sure itā€™s a bug šŸ˜›

When running torchaudio speed + rate SoX effect chain inside of a ProcessPoolExecutor on the CLSP grid, the subprocess experiences segmentation fault inside the apply_effects_tensor function. I managed to make the subprocess wait and attached gdb to it, and this is the native stack trace I got:

Program received signal SIGSEGV, Segmentation fault.
0x00007fb3862087d6 in __kmp_acquire_ticket_lock () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/mkl/../../../libiomp5.so
(gdb) bt
#0  0x00007fb3862087d6 in __kmp_acquire_ticket_lock () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/mkl/../../../libiomp5.so
#1  0x00007fb3861dad4a in __kmpc_set_lock () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/mkl/../../../libiomp5.so
#2  0x00007fb30e244746 in update_fft_cache (len=len@entry=2048) at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/effects_i_dsp.c:190
#3  0x00007fb30e245187 in lsx_safe_rdft (len=2048, type=1, d=0x55f3b1e30700) at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/effects_i_dsp.c:218
#4  0x00007fb30e254be5 in dft_stage_init (instance=instance@entry=0, Fp=0.91362772738460019, Fs=Fs@entry=1, Fn=2, att=att@entry=132.45319809215172, phase=phase@entry=50, stage=0x55f3b1ed5730, L=L@entry=2, M=1)
    at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/rate.c:239
#5  0x00007fb30e2568b6 in rate_init (noSmallIntOpt=<optimized out>, max_coefs_size=400, interpolator=-1, use_hi_prec_clock=sox_false, maintain_3dB_pt=<optimized out>, rolloff=rolloff_small, anti_aliasing_pc=<optimized out>,
    bw_pc=<optimized out>, phase=50, bits=<optimized out>, factor=0.98090137934956023, shared=<optimized out>, p=<optimized out>) at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/rate.c:367
#6  start (effp=effp@entry=0x55f3b1e2edc0) at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/rate.c:632
#7  0x00007fb30e241f95 in sox_add_effect (chain=0x55f3b1fdec40, effp=effp@entry=0x55f3b1e2edc0, in=in@entry=0x7ffdf6d5acf0, out=out@entry=0x7ffdf6d5acd0)
    at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/effects.c:157
#8  0x00007fb30e232bcd in torchaudio::sox_effects_chain::SoxEffectsChain::addEffect (this=this@entry=0x7ffdf6d5ac90, effect=...) from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torchaudio/_torchaudio.so
#9  0x00007fb30e1f3e1f in torchaudio::sox_effects::apply_effects_tensor (input_signal=..., effects=...) from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torchaudio/_torchaudio.so
#10 0x00007fb30e20b593 in c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > (*)(c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > >), c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> >, c10::guts::typelist::typelist<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > > > >::operator() (this=<optimized out>, args#1=<error reading variable: access outside bounds of object referenced via synthetic pointer>, args#0=...)
   from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torchaudio/_torchaudio.so
#11 c10::impl::call_functor_with_args_from_stack_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > (*)(c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > >), c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> >, c10::guts::typelist::typelist<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > > > >, false, 0ul, 1ul> (stack=0x7ffdf6d5b490, functor=<optimized out>) at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:339
#12 c10::impl::call_functor_with_args_from_stack<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > (*)(c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > >), c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> >, c10::guts::typelist::typelist<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > > > >, false> (stack=0x7ffdf6d5b490, functor=<optimized out>) at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:346
#13 _ZZN3c104impl31make_boxed_from_unboxed_functorINS0_6detail31WrapFunctionIntoRuntimeFunctor_IPFNS_13intrusive_ptrIN10torchaudio9sox_utils12TensorSignalENS_6detail34intrusive_target_default_null_typeIS7_EEEERKSB_St6vectorISE_ISsSaISsEESaISG_EEESB_NS_4guts8typelist8typelistIJSD_SI_EEEEELb0EE4callEPNS_14OperatorKernelERKNS_14OperatorHandleEPSE_INS_6IValueESaISW_EEENKUlT_E_clINSL_6detail9_identityEEEDaS10_ (__closure=<optimized out>, delay_check=...)
    at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:392
#14 _ZN3c104guts6detail13_if_constexprILb1EE4callIZNS_4impl31make_boxed_from_unboxed_functorINS5_6detail31WrapFunctionIntoRuntimeFunctor_IPFNS_13intrusive_ptrIN10torchaudio9sox_utils12TensorSignalENS_6detail34intrusive_target_default_null_typeISC_EEEERKSG_St6vectorISJ_ISsSaISsEESaISL_EEESG_NS0_8typelist8typelistIJSI_SN_EEEEELb0EE4callEPNS_14OperatorKernelERKNS_14OperatorHandleEPSJ_INS_6IValueESaIS10_EEEUlT_E_ZNSU_4callESW_SZ_S13_EUlvE0_LPv0EEEDcOS14_OT0_ (thenCallback=<optimized out>)
    at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:193
#15 _ZN3c104guts12if_constexprILb1EZNS_4impl31make_boxed_from_unboxed_functorINS2_6detail31WrapFunctionIntoRuntimeFunctor_IPFNS_13intrusive_ptrIN10torchaudio9sox_utils12TensorSignalENS_6detail34intrusive_target_default_null_typeIS9_EEEERKSD_St6vectorISG_ISsSaISsEESaISI_EEESD_NS0_8typelist8typelistIJSF_SK_EEEEELb0EE4callEPNS_14OperatorKernelERKNS_14OperatorHandleEPSG_INS_6IValueESaISX_EEEUlT_E_ZNSR_4callEST_SW_S10_EUlvE0_EEDcOT0_OT1_ (elseCallback=<optimized out>,
    thenCallback=<optimized out>) at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:282
#16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > (*)(c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > >), c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> >, c10::guts::typelist::typelist<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > > > >, false>::call (functor=<optimized out>, stack=0x7ffdf6d5b490) at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:388
#17 0x00007fb35fa661f2 in c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#18 0x00007fb35fa61829 in torch::jit::(anonymous namespace)::createOperatorFromC10_withTracingHandledHere(c10::OperatorHandle const&)::{lambda(std::vector<c10::IValue, std::allocator<c10::IValue> >*)#1}::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >*) const () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#19 0x00007fb36458f42d in torch::jit::invokeOperatorFromPython(std::vector<std::shared_ptr<torch::jit::Operator>, std::allocator<std::shared_ptr<torch::jit::Operator> > > const&, pybind11::args, pybind11::kwargs) ()
   from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#20 0x00007fb364567624 in torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#104}::operator()(std::string const&) const::{lambda(pybind11::args, {lambda(std::string const&)#104}::kwargs)#1}::operator()(pybind11, pybind11::args) const () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#21 0x00007fb364567a0c in void pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#104}::operator()(std::string const&) const::{lambda(pybind11::args, pybind11::kwargs)#1}, pybind11::object, {lambda(std::string const&)#104}, pybind11::args, pybind11::name, pybind11::doc>(torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#104}::operator()(std::string const&) const::{lambda(pybind11::args, pybind11::kwargs)#1}&&, pybind11::object (*)({lambda(std::string const&)#104}, pybind11::args), pybind11::name const&, pybind11::doc const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail) ()
   from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#22 0x00007fb3641da5ea in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#23 0x000055f3adb53c94 in _PyMethodDef_RawFastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:694
#24 0x000055f3adb53db1 in _PyCFunction_FastCallKeywords (func=0x7fb30cdac960, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:734
#25 0x000055f3adbbf5be in call_function (kwnames=0x0, oparg=2, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4568
#26 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3093
#27 0x000055f3adb032b9 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
#28 0x000055f3adb53435 in _PyFunction_FastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:433
#29 0x000055f3adbbf229 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
#30 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3093
#31 0x000055f3adb0431b in function_code_fastcall (globals=<optimized out>, nargs=3, args=<optimized out>, co=0x7fb30e107ae0) at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:283
#32 _PyFunction_FastCallDict () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:322
#33 0x000055f3adb22b93 in _PyObject_Call_Prepend () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:908
#34 0x000055f3adb5a16a in slot_tp_call () at /tmp/build/80754af9/python_1588882889832/work/Objects/typeobject.c:6402
#35 0x000055f3adb5b00b in _PyObject_FastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:199
#36 0x000055f3adbbf186 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4619
#37 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3124
#38 0x000055f3adb032b9 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
#39 0x000055f3adb53497 in _PyFunction_FastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:433
#40 0x000055f3adbbbcba in call_function (kwnames=0x7fb30e106fb0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
#41 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3139
#42 0x000055f3adb032b9 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
#43 0x000055f3adb04610 in _PyFunction_FastCallDict () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:376
#44 0x000055f3adb22b93 in _PyObject_Call_Prepend () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:908
#45 0x000055f3adb1595e in PyObject_Call () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:245

The same test seems to be hanging in Lhotseā€™s GitHub Actions CI: https://github.com/lhotse-speech/lhotse/pull/124/checks?check_run_id=1391378614

To Reproduce

Steps to reproduce the behavior:

  1. the latest PyTorch with torchaudio (via conda) and install Lhotse from the torchaudio data augmentation branch git clone https://github.com/lhotse-speech/lhotse && cd lhotse && git checkout feature/augmentation-refactoring && pip install -e '.[dev]'
  2. Run this test using pytest test/known_issues/test_augment_with_executor.py

Expected behavior

No crash

Environment

  • What commands did you used to install torchaudio (conda/pip/build from source)? conda
  • If you are building from source, which commit is it?
  • What does torchaudio.__version__ print? (If applicable)

Collecting environment informationā€¦ PyTorch version: 1.7.0 Is debug build: True CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 9.13 (stretch) (x86_64) GCC version: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 Clang version: 3.8.1-24 (tags/RELEASE_381/final) CMake version: version 3.7.2

Python version: 3.7 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce GTX 1080 Ti GPU 1: GeForce GTX 1080 Ti GPU 2: GeForce GTX 1080 Ti GPU 3: GeForce GTX 1080 Ti

Nvidia driver version: 440.33.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.18.5 [pip3] torch==1.7.0 [pip3] torchaudio==0.7.0a0+ac17b64 [pip3] torchvision==0.8.1 [conda] blas 1.0 mkl [conda] cudatoolkit 10.2.89 hfd86e86_1 [conda] mkl 2020.1 217 [conda] mkl-service 2.3.0 py37he904b0f_0 [conda] mkl_fft 1.1.0 py37h23d657b_0 [conda] mkl_random 1.1.1 py37h0573a6f_0 [conda] numpy 1.18.5 py37ha1c710e_0 [conda] numpy-base 1.18.5 py37hde5b4d6_0 [conda] pytorch 1.7.0 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch [conda] torchaudio 0.5.1 pypi_0 pypi [conda] torchvision 0.8.1 py37_cu102 pytorch

Additional context

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
mthrokcommented, Nov 16, 2020

I talked with @malfet, and it is most like that Intelā€™s OpenMP and GNU Open MP are conflicting.

1reaction
mthrokcommented, Nov 13, 2020

On Ubuntu, disabling OpenMP support for libsox seems to resolve the issue. https://github.com/pytorch/audio/pull/1026

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python Multiprocessing: Crash in subprocess? - Stack Overflow
If you were running a blocking map or apply call that was relying on that work item to complete, it will likely hang...
Read more >
Why your multiprocessing Pool is stuck (it's full of sharks!)
On Linux, the default configuration of Python's multiprocessing library can lead to deadlocks and brokenness. Learn why, and how to fix it.
Read more >
Supplemental Document: BIG-IP 15.1.4 Fixes and Known Issues
TMM may crash if traffic is run through APM per-request policy containing an empty variable assign agent. 997641, 3-Major, APM policy ending with...
Read more >
Untitled
Pfc canal 127 ao vivo, Mclr base rate, Karaoke diferentes niveles, Kerala school ... Keinett minecraft launcher crash, The romantic and idol ep...
Read more >
Things I Wish They Told Me About Multiprocessing in Python
Subprocesses can hang or fail to shutdown cleanly, potentially leaving some system resources unavailable, and, potentially worse, leaving someĀ ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found