segmentation fault illegal instruction
See original GitHub issuesetup
ubuntu 16.04 tvm 0.7 dev1 pytorch 1.4.0 transformer 2.11.0 other same as requirements.txt
issue
I uncomment the line in diagonaled_mm_tvm.py
DiagonaledMM._get_function('float32', 'cuda')
After that, When I run the code , it show Loading tvm binary from :./longformer/lib/lib_diagonaled_mm_float32_cuda.so … segmentation fault (core dump) or show Loading tvm binary from :./longformer/lib/lib_diagonaled_mm_float32_cuda.so … illegal instruction (core dump)
other
I test the tvm, tensorflow and pytorch, there are fine. And I follow the scripts/cheatsheet.txt to regenerate the lib_diagonaled_mm_float32_cuda.so, it can generate succeed.
Any idea or suggestion?
the code is below
import torch
from longformer.longformer import Longformer, LongformerConfig
from longformer.sliding_chunks import pad_to_window_size
from transformers import RobertaTokenizer
config = LongformerConfig.from_pretrained('longformer-base-4096/')
# choose the attention mode 'n2', 'tvm' or 'sliding_chunks'
# 'n2': for regular n2 attantion
# 'tvm': a custom CUDA kernel implementation of our sliding window attention
# 'sliding_chunks': a PyTorch implementation of our sliding window attention
config.attention_mode = 'tvm'
model = Longformer.from_pretrained('longformer-base-4096/', config=config)
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
tokenizer.model_max_length = model.config.max_position_embeddings
SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000) # long input document
input_ids = torch.tensor(tokenizer.encode(SAMPLE_TEXT)).unsqueeze(0) # batch of size 1
# TVM code doesn't work on CPU. Uncomment this if `config.attention_mode = 'tvm'`
model = model.cuda(); input_ids = input_ids.cuda()
# Attention mask values -- 0: no attention, 1: local attention, 2: global attention
attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device) # initialize to local attention
attention_mask[:, [1, 4, 21,]] = 2 # Set global attention based on the task. For example,
# classification: the <s> token
# QA: question tokens
# padding seqlen to the nearest multiple of 512. Needed for the 'sliding_chunks' attention
input_ids, attention_mask = pad_to_window_size(
input_ids, attention_mask, config.attention_window[0], tokenizer.pad_token_id)
output = model(input_ids, attention_mask=attention_mask)[0]
Issue Analytics
- State:
- Created 3 years ago
- Comments:13
Top Results From Across the Web
Segmentation Fault and GCC Illegal Instruction - Nebula Graph
This article shares the trouble shooting steps of two compiling errors: Segmentation fault and illegal instruction.
Read more >What's the difference between Segmentation fault and Bus ...
Bus Error (also known as SIGBUS and is usually signal 10) - You can encounter this signal error when an invalid pointer is...
Read more >C illegal instruction - Stack Overflow
When your program has UB it can seg fault, get an illegal instruction error, appear to work or any other behaviour. It's not...
Read more >Program crash messages - Helpful
Illegal instruction means the CPU got an instruction it did not support. ... It can happen when executable code becoming corrupted. More commonly, ......
Read more >Illegal instruction fault in stack based buffer overflow
Try to step instruction by instruction through gdb with the si ( stepi ) command. · You can turn on core dumps and...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Very interesting. Maybe a fused self-attention function or something. I will be curious to see how this goes.
Depending on how familiar you are with TVM, you might find the following discussions useful, https://discuss.tvm.ai/t/optimizing-matrix-multiplication-for-gpu/4212/24 https://discuss.tvm.ai/t/competitive-gemm-matmul-example/5478 https://discuss.tvm.ai/t/developing-a-faster-schedule-for-longformers-kernel/6367
Another suggestion; can you try running it from inside the docker container that we use to compile the cuda kernel? Follow the instructions here: https://github.com/allenai/longformer/blob/master/scripts/cheatsheet.txt#L6 to build and run the docker image, then try to run it. You don’t need to recompile the binaries, it is enough to load the existing one.
I am curious, what are you using it for, and would the
sliding_chunks
implementation be enough for your use case?