SoX effect "rate" crashing or hanging in multiprocessing
See original GitHub issueš Bug
This time Iām pretty sure itās a bug š
When running torchaudio speed + rate SoX effect chain inside of a ProcessPoolExecutor
on the CLSP grid, the subprocess experiences segmentation fault inside the apply_effects_tensor
function. I managed to make the subprocess wait and attached gdb to it, and this is the native stack trace I got:
Program received signal SIGSEGV, Segmentation fault.
0x00007fb3862087d6 in __kmp_acquire_ticket_lock () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/mkl/../../../libiomp5.so
(gdb) bt
#0 0x00007fb3862087d6 in __kmp_acquire_ticket_lock () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/mkl/../../../libiomp5.so
#1 0x00007fb3861dad4a in __kmpc_set_lock () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/mkl/../../../libiomp5.so
#2 0x00007fb30e244746 in update_fft_cache (len=len@entry=2048) at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/effects_i_dsp.c:190
#3 0x00007fb30e245187 in lsx_safe_rdft (len=2048, type=1, d=0x55f3b1e30700) at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/effects_i_dsp.c:218
#4 0x00007fb30e254be5 in dft_stage_init (instance=instance@entry=0, Fp=0.91362772738460019, Fs=Fs@entry=1, Fn=2, att=att@entry=132.45319809215172, phase=phase@entry=50, stage=0x55f3b1ed5730, L=L@entry=2, M=1)
at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/rate.c:239
#5 0x00007fb30e2568b6 in rate_init (noSmallIntOpt=<optimized out>, max_coefs_size=400, interpolator=-1, use_hi_prec_clock=sox_false, maintain_3dB_pt=<optimized out>, rolloff=rolloff_small, anti_aliasing_pc=<optimized out>,
bw_pc=<optimized out>, phase=50, bits=<optimized out>, factor=0.98090137934956023, shared=<optimized out>, p=<optimized out>) at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/rate.c:367
#6 start (effp=effp@entry=0x55f3b1e2edc0) at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/rate.c:632
#7 0x00007fb30e241f95 in sox_add_effect (chain=0x55f3b1fdec40, effp=effp@entry=0x55f3b1e2edc0, in=in@entry=0x7ffdf6d5acf0, out=out@entry=0x7ffdf6d5acd0)
at /opt/conda/conda-bld/torchaudio_1603752092839/work/third_party/src/libsox/src/effects.c:157
#8 0x00007fb30e232bcd in torchaudio::sox_effects_chain::SoxEffectsChain::addEffect (this=this@entry=0x7ffdf6d5ac90, effect=...) from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torchaudio/_torchaudio.so
#9 0x00007fb30e1f3e1f in torchaudio::sox_effects::apply_effects_tensor (input_signal=..., effects=...) from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torchaudio/_torchaudio.so
#10 0x00007fb30e20b593 in c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > (*)(c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > >), c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> >, c10::guts::typelist::typelist<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > > > >::operator() (this=<optimized out>, args#1=<error reading variable: access outside bounds of object referenced via synthetic pointer>, args#0=...)
from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torchaudio/_torchaudio.so
#11 c10::impl::call_functor_with_args_from_stack_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > (*)(c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > >), c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> >, c10::guts::typelist::typelist<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > > > >, false, 0ul, 1ul> (stack=0x7ffdf6d5b490, functor=<optimized out>) at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:339
#12 c10::impl::call_functor_with_args_from_stack<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > (*)(c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > >), c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> >, c10::guts::typelist::typelist<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > > > >, false> (stack=0x7ffdf6d5b490, functor=<optimized out>) at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:346
#13 _ZZN3c104impl31make_boxed_from_unboxed_functorINS0_6detail31WrapFunctionIntoRuntimeFunctor_IPFNS_13intrusive_ptrIN10torchaudio9sox_utils12TensorSignalENS_6detail34intrusive_target_default_null_typeIS7_EEEERKSB_St6vectorISE_ISsSaISsEESaISG_EEESB_NS_4guts8typelist8typelistIJSD_SI_EEEEELb0EE4callEPNS_14OperatorKernelERKNS_14OperatorHandleEPSE_INS_6IValueESaISW_EEENKUlT_E_clINSL_6detail9_identityEEEDaS10_ (__closure=<optimized out>, delay_check=...)
at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:392
#14 _ZN3c104guts6detail13_if_constexprILb1EE4callIZNS_4impl31make_boxed_from_unboxed_functorINS5_6detail31WrapFunctionIntoRuntimeFunctor_IPFNS_13intrusive_ptrIN10torchaudio9sox_utils12TensorSignalENS_6detail34intrusive_target_default_null_typeISC_EEEERKSG_St6vectorISJ_ISsSaISsEESaISL_EEESG_NS0_8typelist8typelistIJSI_SN_EEEEELb0EE4callEPNS_14OperatorKernelERKNS_14OperatorHandleEPSJ_INS_6IValueESaIS10_EEEUlT_E_ZNSU_4callESW_SZ_S13_EUlvE0_LPv0EEEDcOS14_OT0_ (thenCallback=<optimized out>)
at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:193
#15 _ZN3c104guts12if_constexprILb1EZNS_4impl31make_boxed_from_unboxed_functorINS2_6detail31WrapFunctionIntoRuntimeFunctor_IPFNS_13intrusive_ptrIN10torchaudio9sox_utils12TensorSignalENS_6detail34intrusive_target_default_null_typeIS9_EEEERKSD_St6vectorISG_ISsSaISsEESaISI_EEESD_NS0_8typelist8typelistIJSF_SK_EEEEELb0EE4callEPNS_14OperatorKernelERKNS_14OperatorHandleEPSG_INS_6IValueESaISX_EEEUlT_E_ZNSR_4callEST_SW_S10_EUlvE0_EEDcOT0_OT1_ (elseCallback=<optimized out>,
thenCallback=<optimized out>) at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:282
#16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > (*)(c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > >), c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> >, c10::guts::typelist::typelist<c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal, c10::detail::intrusive_target_default_null_type<torchaudio::sox_utils::TensorSignal> > const&, std::vector<std::vector<std::string, std::allocator<std::string> >, std::allocator<std::vector<std::string, std::allocator<std::string> > > > > >, false>::call (functor=<optimized out>, stack=0x7ffdf6d5b490) at /opt/conda/conda-bld/torchaudio_1603752092839/work/torchaudio/csrc/register.cpp:388
#17 0x00007fb35fa661f2 in c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#18 0x00007fb35fa61829 in torch::jit::(anonymous namespace)::createOperatorFromC10_withTracingHandledHere(c10::OperatorHandle const&)::{lambda(std::vector<c10::IValue, std::allocator<c10::IValue> >*)#1}::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >*) const () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#19 0x00007fb36458f42d in torch::jit::invokeOperatorFromPython(std::vector<std::shared_ptr<torch::jit::Operator>, std::allocator<std::shared_ptr<torch::jit::Operator> > > const&, pybind11::args, pybind11::kwargs) ()
from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#20 0x00007fb364567624 in torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#104}::operator()(std::string const&) const::{lambda(pybind11::args, {lambda(std::string const&)#104}::kwargs)#1}::operator()(pybind11, pybind11::args) const () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#21 0x00007fb364567a0c in void pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#104}::operator()(std::string const&) const::{lambda(pybind11::args, pybind11::kwargs)#1}, pybind11::object, {lambda(std::string const&)#104}, pybind11::args, pybind11::name, pybind11::doc>(torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#104}::operator()(std::string const&) const::{lambda(pybind11::args, pybind11::kwargs)#1}&&, pybind11::object (*)({lambda(std::string const&)#104}, pybind11::args), pybind11::name const&, pybind11::doc const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail) ()
from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#22 0x00007fb3641da5ea in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /home/pzelasko/miniconda3/envs/lhotse/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#23 0x000055f3adb53c94 in _PyMethodDef_RawFastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:694
#24 0x000055f3adb53db1 in _PyCFunction_FastCallKeywords (func=0x7fb30cdac960, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:734
#25 0x000055f3adbbf5be in call_function (kwnames=0x0, oparg=2, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4568
#26 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3093
#27 0x000055f3adb032b9 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
#28 0x000055f3adb53435 in _PyFunction_FastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:433
#29 0x000055f3adbbf229 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
#30 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3093
#31 0x000055f3adb0431b in function_code_fastcall (globals=<optimized out>, nargs=3, args=<optimized out>, co=0x7fb30e107ae0) at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:283
#32 _PyFunction_FastCallDict () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:322
#33 0x000055f3adb22b93 in _PyObject_Call_Prepend () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:908
#34 0x000055f3adb5a16a in slot_tp_call () at /tmp/build/80754af9/python_1588882889832/work/Objects/typeobject.c:6402
#35 0x000055f3adb5b00b in _PyObject_FastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:199
#36 0x000055f3adbbf186 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4619
#37 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3124
#38 0x000055f3adb032b9 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
#39 0x000055f3adb53497 in _PyFunction_FastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:433
#40 0x000055f3adbbbcba in call_function (kwnames=0x7fb30e106fb0, oparg=<optimized out>, pp_stack=<synthetic pointer>) at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
#41 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3139
#42 0x000055f3adb032b9 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
#43 0x000055f3adb04610 in _PyFunction_FastCallDict () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:376
#44 0x000055f3adb22b93 in _PyObject_Call_Prepend () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:908
#45 0x000055f3adb1595e in PyObject_Call () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:245
The same test seems to be hanging in Lhotseās GitHub Actions CI: https://github.com/lhotse-speech/lhotse/pull/124/checks?check_run_id=1391378614
To Reproduce
Steps to reproduce the behavior:
- the latest PyTorch with torchaudio (via conda) and install Lhotse from the torchaudio data augmentation branch
git clone https://github.com/lhotse-speech/lhotse && cd lhotse && git checkout feature/augmentation-refactoring && pip install -e '.[dev]'
- Run this test using
pytest test/known_issues/test_augment_with_executor.py
Expected behavior
No crash
Environment
- What commands did you used to install torchaudio (conda/pip/build from source)? conda
- If you are building from source, which commit is it?
- What does
torchaudio.__version__
print? (If applicable)
Collecting environment informationā¦ PyTorch version: 1.7.0 Is debug build: True CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 9.13 (stretch) (x86_64) GCC version: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 Clang version: 3.8.1-24 (tags/RELEASE_381/final) CMake version: version 3.7.2
Python version: 3.7 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce GTX 1080 Ti GPU 1: GeForce GTX 1080 Ti GPU 2: GeForce GTX 1080 Ti GPU 3: GeForce GTX 1080 Ti
Nvidia driver version: 440.33.01 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.18.5 [pip3] torch==1.7.0 [pip3] torchaudio==0.7.0a0+ac17b64 [pip3] torchvision==0.8.1 [conda] blas 1.0 mkl [conda] cudatoolkit 10.2.89 hfd86e86_1 [conda] mkl 2020.1 217 [conda] mkl-service 2.3.0 py37he904b0f_0 [conda] mkl_fft 1.1.0 py37h23d657b_0 [conda] mkl_random 1.1.1 py37h0573a6f_0 [conda] numpy 1.18.5 py37ha1c710e_0 [conda] numpy-base 1.18.5 py37hde5b4d6_0 [conda] pytorch 1.7.0 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch [conda] torchaudio 0.5.1 pypi_0 pypi [conda] torchvision 0.8.1 py37_cu102 pytorch
Additional context
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (11 by maintainers)
Top GitHub Comments
I talked with @malfet, and it is most like that Intelās OpenMP and GNU Open MP are conflicting.
On Ubuntu, disabling OpenMP support for libsox seems to resolve the issue. https://github.com/pytorch/audio/pull/1026