question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't set attribute tokenizer.max_len

See original GitHub issue

Describe the bug Model I am using (UniLM, MiniLM, LayoutLM …): MiniLM

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

! export CUDA_VISIBLE_DEVICES=0
! python decode_seq2seq.py \
  --model_type minilm --tokenizer_name minilm-l12-h384-uncased --input_file sample.json --split validation --do_lower_case \
  --model_path output_dir/ckpt-32000 --max_seq_length 464 --max_tgt_length 48 --batch_size 1 --beam_size 5 \
  --length_penalty 0 --forbid_duplicate_ngrams --mode s2s --forbid_ignore_word "."

Expected behavior Here is the trace:

2020-08-05 06:30:16.386075: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
08/05/2020 06:30:18 - INFO - transformers.tokenization_utils -   loading file https://unilm.blob.core.windows.net/ckpt/minilm-l12-h384-uncased-vocab.txt from cache at /root/.cache/torch/transformers/c6a0d170b6fcc6d023a402d9c81e5526a82901ffed3eb6021fb0ec17cfd24711.0af242a3765cd96e2c6ad669a38c22d99d583824740a9a2b36fe3ed5a07d0503
Traceback (most recent call last):
  File "decode_seq2seq.py", line 296, in <module>
    main()
  File "decode_seq2seq.py", line 161, in main
    tokenizer.max_len = args.max_seq_length
AttributeError: can't set attribute
  • Platform:
  • Python version:
  • PyTorch version (GPU?):

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
guijuzhejiangcommented, Oct 30, 2020

@guijuzhejiang @ vishal -burman @ othman-zennaki

https://github.com/microsoft/unilm/blob/master/s2s-ft/decode_seq2seq.py#L152 Change this line into:

if hasattr(tokenizer, 'model_max_length'):
    tokenizer.model_max_length = args.max_seq_length
elif hasattr(tokenizer, 'max_len'):
    tokenizer.max_len = args.max_seq_length

Or

pip install transformers==2.2.1 This issue is caused by the interface change after the update of transformers package

====================================================================================================== Thank you for your reply. I resolved it by adding a decorator.

0reactions
guijuzhejiangcommented, Oct 30, 2020

Add the decorator in tokenization_utils.py @max_len.setter def max_len(self, value): self.model_max_length = value

Read more comments on GitHub >

github_iconTop Results From Across the Web

'BertTokenizerFast' object has no attribute 'max_len ... - GitHub
AttributeError : 'RobertaTokenizerFast' object has no attribute 'max_len". I can't switch to a new script as you mentioned. Kindly help me with ...
Read more >
Tokenizer — transformers 2.11.0 documentation - Hugging Face
When the tokenizer is loaded with from_pretrained , this will be set to the value stored for the associated model in max_model_input_sizes (see...
Read more >
AttributeError: 'GPT2TokenizerFast' object has no attribute ...
The "AttributeError: 'BertTokenizerFast' object has no attribute 'max_len'" ... If not, the fix is to change max_len to model_max_length .
Read more >
tokenization_utils.py - CodaLab Worksheets
This contextmanager assumes the provider tokenizer has no padding / truncation strategy before the managed section. If your tokenizer set a padding ...
Read more >
[Solved] AttributeError: can't set attribute in python - Finxter
The easiest way to fix the AttributeError:can't set attribute is to create a new namedtuple object with the namedtuple._replace() method.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found