Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't set attribute tokenizer.max_len

See original GitHub issue

Describe the bug Model I am using (UniLM, MiniLM, LayoutLM …): MiniLM

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

! export CUDA_VISIBLE_DEVICES=0
! python decode_seq2seq.py \
  --model_type minilm --tokenizer_name minilm-l12-h384-uncased --input_file sample.json --split validation --do_lower_case \
  --model_path output_dir/ckpt-32000 --max_seq_length 464 --max_tgt_length 48 --batch_size 1 --beam_size 5 \
  --length_penalty 0 --forbid_duplicate_ngrams --mode s2s --forbid_ignore_word "."

Expected behavior Here is the trace:

2020-08-05 06:30:16.386075: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
08/05/2020 06:30:18 - INFO - transformers.tokenization_utils -   loading file https://unilm.blob.core.windows.net/ckpt/minilm-l12-h384-uncased-vocab.txt from cache at /root/.cache/torch/transformers/c6a0d170b6fcc6d023a402d9c81e5526a82901ffed3eb6021fb0ec17cfd24711.0af242a3765cd96e2c6ad669a38c22d99d583824740a9a2b36fe3ed5a07d0503
Traceback (most recent call last):
  File "decode_seq2seq.py", line 296, in <module>
    main()
  File "decode_seq2seq.py", line 161, in main
    tokenizer.max_len = args.max_seq_length
AttributeError: can't set attribute

Platform:
Python version:
PyTorch version (GPU?):

Issue Analytics

State:
Created 3 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

guijuzhejiangcommented, Oct 30, 2020

@guijuzhejiang @ vishal -burman @ othman-zennaki

https://github.com/microsoft/unilm/blob/master/s2s-ft/decode_seq2seq.py#L152 Change this line into:

if hasattr(tokenizer, 'model_max_length'):
    tokenizer.model_max_length = args.max_seq_length
elif hasattr(tokenizer, 'max_len'):
    tokenizer.max_len = args.max_seq_length

pip install transformers==2.2.1 This issue is caused by the interface change after the update of transformers package

====================================================================================================== Thank you for your reply. I resolved it by adding a decorator.

0reactions

guijuzhejiangcommented, Oct 30, 2020

Add the decorator in tokenization_utils.py @max_len.setter def max_len(self, value): self.model_max_length = value

Top Results From Across the Web

'BertTokenizerFast' object has no attribute 'max_len ... - GitHub

AttributeError : 'RobertaTokenizerFast' object has no attribute 'max_len". I can't switch to a new script as you mentioned. Kindly help me with ...

Tokenizer — transformers 2.11.0 documentation - Hugging Face

When the tokenizer is loaded with from_pretrained , this will be set to the value stored for the associated model in max_model_input_sizes (see...

AttributeError: 'GPT2TokenizerFast' object has no attribute ...

The "AttributeError: 'BertTokenizerFast' object has no attribute 'max_len'" ... If not, the fix is to change max_len to model_max_length .

tokenization_utils.py - CodaLab Worksheets

This contextmanager assumes the provider tokenizer has no padding / truncation strategy before the managed section. If your tokenizer set a padding ...

[Solved] AttributeError: can't set attribute in python - Finxter

The easiest way to fix the AttributeError:can't set attribute is to create a new namedtuple object with the namedtuple._replace() method.