Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possibility to process long documents?

See original GitHub issue

I tried to set max_seq_length to over 512 and there is an error (see error info below). I think this is probably because the original BERT model is trained on 512 tokens?

Is there a possibility to process longer documents based on the current implementation or any simple adaptation (e.g. shared model weights for different chunks in a long document)?

I am thinking about adapting the code for long documents, if you have any suggestions on the implementation, please kindly let me know. Many thanks!

Error when using max_seq_length as 513 is provided below:

Traceback (most recent call last):
  File "xxx.py", line 98, in <module>
    model.train_model(train_df)
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/simpletransformers/classification/multi_label_classification_model.py", line 127, in train_model
    args=args,
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/simpletransformers/classification/classification_model.py", line 262, in train_model
    **kwargs,
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/simpletransformers/classification/classification_model.py", line 352, in train
    outputs = model(**inputs)
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/simpletransformers/custom_models/models.py", line 47, in forward
    head_mask=head_mask,
  File "xxx/envs/pt100/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/transformers/modeling_bert.py", line 799, in forward
    input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/transformers/modeling_bert.py", line 195, in forward
    embeddings = self.dropout(embeddings)
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/torch/nn/modules/dropout.py", line 58, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "xxx/anaconda/envs/pt100/lib/python3.6/site-packages/torch/nn/functional.py", line 749, in dropout
    else _VF.dropout(input, p, training))
RuntimeError: CUDA error: device-side assert triggered

Best wishes, A

Issue Analytics

State:
Created 4 years ago
Comments:10 (7 by maintainers)

Top GitHub Comments

3reactions

ThilinaRajapaksecommented, Feb 22, 2020

Yes, BERT won’t let you go over 512 tokens. sliding_window is intended to help with this. It works by splitting longer documents into “windows” to keep everything under the length limit. It’s not going to work well in all cases but you can try it out and see.

3reactions

kinoutecommented, Feb 22, 2020

Have a look at the sliding_window feature in the README.