question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Too many bugs in Version 2.5.0

See original GitHub issue
  1. It cannot be installed on MacOS. By runing pip install -U transformers, I got the following errors:

Building wheels for collected packages: tokenizers Building wheel for tokenizers (PEP 517) … error ERROR: Command errored out with exit status 1: command: /anaconda/bin/python /anaconda/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /var/folders/5h/fr2vhgsx4jd8wz4bphzt22_8p1v0bf/T/tmpfh6km7na cwd: /private/var/folders/5h/fr2vhgsx4jd8wz4bphzt22_8p1v0bf/T/pip-install-fog09t3h/tokenizers Complete output (36 lines): running bdist_wheel running build running build_py creating build creating build/lib creating build/lib/tokenizers copying tokenizers/init.py -> build/lib/tokenizers creating build/lib/tokenizers/models copying tokenizers/models/init.py -> build/lib/tokenizers/models creating build/lib/tokenizers/decoders copying tokenizers/decoders/init.py -> build/lib/tokenizers/decoders creating build/lib/tokenizers/normalizers copying tokenizers/normalizers/init.py -> build/lib/tokenizers/normalizers creating build/lib/tokenizers/pre_tokenizers copying tokenizers/pre_tokenizers/init.py -> build/lib/tokenizers/pre_tokenizers creating build/lib/tokenizers/processors copying tokenizers/processors/init.py -> build/lib/tokenizers/processors creating build/lib/tokenizers/trainers copying tokenizers/trainers/init.py -> build/lib/tokenizers/trainers creating build/lib/tokenizers/implementations copying tokenizers/implementations/byte_level_bpe.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/sentencepiece_bpe.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/base_tokenizer.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/init.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/char_level_bpe.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/bert_wordpiece.py -> build/lib/tokenizers/implementations copying tokenizers/init.pyi -> build/lib/tokenizers copying tokenizers/models/init.pyi -> build/lib/tokenizers/models copying tokenizers/decoders/init.pyi -> build/lib/tokenizers/decoders copying tokenizers/normalizers/init.pyi -> build/lib/tokenizers/normalizers copying tokenizers/pre_tokenizers/init.pyi -> build/lib/tokenizers/pre_tokenizers copying tokenizers/processors/init.pyi -> build/lib/tokenizers/processors copying tokenizers/trainers/init.pyi -> build/lib/tokenizers/trainers running build_ext running build_rust error: Can not find Rust compiler

ERROR: Failed building wheel for tokenizers Running setup.py clean for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly

  1. On Linux, it can be installed, but failed with the following code:

import transformers transformers.AutoTokenizer.from_pretrained(“bert-base-cased”).save_pretrained(“./”) transformers.AutoModel.from_pretrained(“bert-base-cased”).save_pretrained(“./”) transformers.AutoTokenizer.from_pretrained(“./”) transformers.AutoModel.from_pretrained(“./”)

Actually, it is the second line that generates the following errors:

Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “/anaconda/lib/python3.7/site-packages/transformers/tokenization_utils.py”, line 587, in save_pretrained return vocab_files + (special_tokens_map_file, added_tokens_file) TypeError: unsupported operand type(s) for +: ‘NoneType’ and ‘tuple’

  1. The vocabulary size of xlm-roberta is wrong, so it failed with the following code, (this bug also exist in Version 2.4.1):

import transformers tokenizer = transformers.AutoTokenizer.from_pretrained(“xlm-roberta-base”) tokenizer.convert_ids_to_tokens(range(tokenizer.vocab_size))

The error is actually caused by the wrong vocab size:

[libprotobuf FATAL /sentencepiece/src/…/third_party/protobuf-lite/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_): terminate called after throwing an instance of ‘google::protobuf::FatalException’ what(): CHECK failed: (index) < (current_size_): zsh: abort python

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:15 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
LysandreJikcommented, Feb 24, 2020

Hi! Indeed, there have been a few issues as this was the first release incorporating tokenizers by default. A new version of tokenizers and transformers will be available either today or tomorrow and should fix most of these.

1reaction
BramVanroycommented, Feb 24, 2020

I cannot answer that, I don’t know what the roadmap looks like.

Read more comments on GitHub >

github_iconTop Results From Across the Web

HAProxy known bugs for version v2.5.0 (maintenance branch ...
Known bugs affecting this version, and already fixed in the maintenance branch. These fixes have already been queued for a more recent 2.5...
Read more >
Please release a patch ASAP to fix Pro 2.5.0! - Esri Community
The ability to undo edits against web feature services is much appreciated ... But this version has reams of bugs, and it crashes...
Read more >
pytest-2.5.0: now down to ZERO reported bugs!
pytest-2.5.0 is a big fixing release, the result of two community bug ... of ZERO reported bugs because it's no fun if your...
Read more >
Bug #1551667 “Collectors consuming all memory reading ...
Hello, after upgrading my collectors in liberty and starting using oslo.messaging 2.5.0, the collectors started to consume all the RAM.
Read more >
Bug listing with status RESOLVED with resolution FIXED as at ...
doesn't seem to catch -r* version suffixes" status:RESOLVED resolution:FIXED ... Bug:2007 - "Portage: add support for multiple emerges/ld.so.preload" ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found