Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segfault when building kernel

See original GitHub issue

As requested by @stas00, I’m opening an issues that I’m experiencing with HuggingFace’s Transformers library here. When running the following script: https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/README.md#pretraining-wav2vec2

I’m getting a segfault error when building the kernels:

[1/3] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/patrick/anaconda3/envs/hu
gging_face/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/
patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/TH -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/THC -isystem /home/patrick/anaconda3/envs/hugging_face/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D_
_CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERS
IONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o                                             
FAILED: custom_cuda_kernel.cuda.o                                                                                                                                                                                                                                                                                 
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/patrick/anaconda3/envs/hugging_
face/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/patric
k/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/TH -isystem /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/torch/include/THC -isystem /home/patrick/anaconda3/envs/hugging_face/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_
NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__
 -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/patrick/anaconda3/envs/hugging_face/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o                                                   
/usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Pe
riod>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
/usr/include/c++/10/chrono:473:154:   required from here                                                                                                                                                                                                                                                          
/usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault                                                                                                                                                                                                                                    
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept

My environment is:

- `transformers` version: 4.10.0.dev0
- Platform: Linux-5.11.0-25-generic-x86_64-with-glibc2.33
- Python version: 3.9.1
- PyTorch version (GPU?): 1.9.0.dev20210217 (True)
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: yes
- Deepspeed: 0.4.4
- CUDA Version: 11.2 
- GPU: 4 x TITAN RTX

To reproduce:

Clone the transformers repo:

git clone https://github.com/huggingface/transformers

Install all packages

cd transformers && pip install -e ".[dev]"

Go inside the wav2vec2 research folder

cd examples/research_projects/wav2vec2

and run the following command:

PYTHONPATH=../../../src deepspeed --num_gpus 4 run_pretrain.py \
--output_dir="./wav2vec2-base-libri-100h" \
--num_train_epochs="3" \
--per_device_train_batch_size="32" \
--per_device_eval_batch_size="32" \
--gradient_accumulation_steps="2" \
--save_total_limit="3" \
--save_steps="500" \
--logging_steps="10" \
--learning_rate="5e-4" \
--weight_decay="0.01" \
--warmup_steps="3000" \
--model_name_or_path="patrickvonplaten/wav2vec2-base-libri-100h" \
--dataset_name="librispeech_asr" \
--dataset_config_name="clean" \
--train_split_name="train.100" \
--preprocessing_num_workers="4" \
--max_duration_in_seconds="10.0" \
--group_by_length \
--verbose_logging \
--fp16 \
--deepspeed ds_config_wav2vec2_zero2.json \

Issue Analytics

State:
Created 2 years ago
Comments:10 (7 by maintainers)

Top GitHub Comments

2reactions

stas00commented, Aug 10, 2021

cc: @RezaYazdaniAminabadi

FWIW, I am not able to reproduce this on my machine. It works just fine for me, but on py38.

@patrickvonplaten, I haven’t noticed in the first place, but I see:

PyTorch version (GPU?): 1.9.0.dev20210217 (True)

Any chance you could update to an official pt-1.9.0 and re-test? Yours is a nightly build and about 2 weeks before 1.9.0 was released. Is it possible there was some issue in it? Just to ensure we are testing the same things.

1reaction

stas00commented, Aug 13, 2021

Thank you for identifying the source of the problem, Reza!

Patrick, it appears that you got hit by being-on-the-cutting-edge software. I’m on gcc 9.3 still and it doesn’t have this problem.

Top Results From Across the Web

Segmentation fault when compiling kernel/modules

to Android Building. Hi All, I am trying to cross-compile kernel/modules of gingerbread for arm on 64-bit Ubuntu (11.04). I am getting segmentation...

Re: Compiler segfault when building the kernel

Re: Compiler segfault when building the kernel ... I've been building kernels (vanilla from upstream) for years with > kernel-package ...

Segmentation Fault when trying to compile kernel 4.16.3-300 ...

Description of problem: when trying to compile the kernel I have a segmentation fault, and a request to submit a bug :) Version-Release ......

Re: Segfault in pahole 1.18 when building kernel 5.9.1 for arm64

Re: Segfault in pahole 1.18 when building kernel 5.9.1 for arm64 [not found] ... However, pahole > > > version 1.18 segfaults during...

Kernel Segfaults for Fun (but no profit) - Stephen Brennan

In “episode 2” of my kernel development series, I'm going to talk about how I put Python into an uninterruptible sleep. This spooky...