question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segfault when building kernel

See original GitHub issue

As requested by @stas00, I’m opening an issues that I’m experiencing with HuggingFace’s Transformers library here. When running the following script: https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/README.md#pretraining-wav2vec2

I’m getting a segfault error when building the kernels:

[1/3] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/patrick/anaconda3/envs/hu
gging_face/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/
patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/TH -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/THC -isystem /home/patrick/anaconda3/envs/hugging_face/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D_
_CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERS
IONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o                                             
FAILED: custom_cuda_kernel.cuda.o                                                                                                                                                                                                                                                                                 
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/patrick/anaconda3/envs/hugging_
face/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/patric
k/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/TH -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/THC -isystem /home/patrick/anaconda3/envs/hugging_face/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_
NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__
 -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o                                                   
/usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Pe
riod>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
/usr/include/c++/10/chrono:473:154:   required from here                                                                                                                                                                                                                                                          
/usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault                                                                                                                                                                                                                                    
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept  

My environment is:

- `transformers` version: 4.10.0.dev0
- Platform: Linux-5.11.0-25-generic-x86_64-with-glibc2.33
- Python version: 3.9.1
- PyTorch version (GPU?): 1.9.0.dev20210217 (True)
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: yes
- Deepspeed: 0.4.4
- CUDA Version: 11.2 
- GPU: 4 x TITAN RTX

To reproduce:

  1. Clone the transformers repo:
git clone https://github.com/huggingface/transformers
  1. Install all packages
cd transformers && pip install -e ".[dev]"
  1. Go inside the wav2vec2 research folder
cd examples/research_projects/wav2vec2

and run the following command:

PYTHONPATH=../../../src deepspeed --num_gpus 4 run_pretrain.py \
--output_dir="./wav2vec2-base-libri-100h" \
--num_train_epochs="3" \
--per_device_train_batch_size="32" \
--per_device_eval_batch_size="32" \
--gradient_accumulation_steps="2" \
--save_total_limit="3" \
--save_steps="500" \
--logging_steps="10" \
--learning_rate="5e-4" \
--weight_decay="0.01" \
--warmup_steps="3000" \
--model_name_or_path="patrickvonplaten/wav2vec2-base-libri-100h" \
--dataset_name="librispeech_asr" \
--dataset_config_name="clean" \
--train_split_name="train.100" \
--preprocessing_num_workers="4" \
--max_duration_in_seconds="10.0" \
--group_by_length \
--verbose_logging \
--fp16 \
--deepspeed ds_config_wav2vec2_zero2.json \

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
stas00commented, Aug 10, 2021

cc: @RezaYazdaniAminabadi

FWIW, I am not able to reproduce this on my machine. It works just fine for me, but on py38.

@patrickvonplaten, I haven’t noticed in the first place, but I see:

PyTorch version (GPU?): 1.9.0.dev20210217 (True)

Any chance you could update to an official pt-1.9.0 and re-test? Yours is a nightly build and about 2 weeks before 1.9.0 was released. Is it possible there was some issue in it? Just to ensure we are testing the same things.

1reaction
stas00commented, Aug 13, 2021

Thank you for identifying the source of the problem, Reza!

Patrick, it appears that you got hit by being-on-the-cutting-edge software. I’m on gcc 9.3 still and it doesn’t have this problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Segmentation fault when compiling kernel/modules
to Android Building. Hi All, I am trying to cross-compile kernel/modules of gingerbread for arm on 64-bit Ubuntu (11.04). I am getting segmentation...
Read more >
Re: Compiler segfault when building the kernel
Re: Compiler segfault when building the kernel ... I've been building kernels (vanilla from upstream) for years with > kernel-package ...
Read more >
Segmentation Fault when trying to compile kernel 4.16.3-300 ...
Description of problem: when trying to compile the kernel I have a segmentation fault, and a request to submit a bug :) Version-Release ......
Read more >
Re: Segfault in pahole 1.18 when building kernel 5.9.1 for arm64
Re: Segfault in pahole 1.18 when building kernel 5.9.1 for arm64 [not found] ... However, pahole > > > version 1.18 segfaults during...
Read more >
Kernel Segfaults for Fun (but no profit) - Stephen Brennan
In “episode 2” of my kernel development series, I'm going to talk about how I put Python into an uninterruptible sleep. This spooky...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found