Too many bugs in Version 2.5.0
See original GitHub issue- It cannot be installed on MacOS. By runing
pip install -U transformers
, I got the following errors:
Building wheels for collected packages: tokenizers Building wheel for tokenizers (PEP 517) … error ERROR: Command errored out with exit status 1: command: /anaconda/bin/python /anaconda/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /var/folders/5h/fr2vhgsx4jd8wz4bphzt22_8p1v0bf/T/tmpfh6km7na cwd: /private/var/folders/5h/fr2vhgsx4jd8wz4bphzt22_8p1v0bf/T/pip-install-fog09t3h/tokenizers Complete output (36 lines): running bdist_wheel running build running build_py creating build creating build/lib creating build/lib/tokenizers copying tokenizers/init.py -> build/lib/tokenizers creating build/lib/tokenizers/models copying tokenizers/models/init.py -> build/lib/tokenizers/models creating build/lib/tokenizers/decoders copying tokenizers/decoders/init.py -> build/lib/tokenizers/decoders creating build/lib/tokenizers/normalizers copying tokenizers/normalizers/init.py -> build/lib/tokenizers/normalizers creating build/lib/tokenizers/pre_tokenizers copying tokenizers/pre_tokenizers/init.py -> build/lib/tokenizers/pre_tokenizers creating build/lib/tokenizers/processors copying tokenizers/processors/init.py -> build/lib/tokenizers/processors creating build/lib/tokenizers/trainers copying tokenizers/trainers/init.py -> build/lib/tokenizers/trainers creating build/lib/tokenizers/implementations copying tokenizers/implementations/byte_level_bpe.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/sentencepiece_bpe.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/base_tokenizer.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/init.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/char_level_bpe.py -> build/lib/tokenizers/implementations copying tokenizers/implementations/bert_wordpiece.py -> build/lib/tokenizers/implementations copying tokenizers/init.pyi -> build/lib/tokenizers copying tokenizers/models/init.pyi -> build/lib/tokenizers/models copying tokenizers/decoders/init.pyi -> build/lib/tokenizers/decoders copying tokenizers/normalizers/init.pyi -> build/lib/tokenizers/normalizers copying tokenizers/pre_tokenizers/init.pyi -> build/lib/tokenizers/pre_tokenizers copying tokenizers/processors/init.pyi -> build/lib/tokenizers/processors copying tokenizers/trainers/init.pyi -> build/lib/tokenizers/trainers running build_ext running build_rust error: Can not find Rust compiler
ERROR: Failed building wheel for tokenizers Running setup.py clean for tokenizers Failed to build tokenizers ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly
- On Linux, it can be installed, but failed with the following code:
import transformers transformers.AutoTokenizer.from_pretrained(“bert-base-cased”).save_pretrained(“./”) transformers.AutoModel.from_pretrained(“bert-base-cased”).save_pretrained(“./”) transformers.AutoTokenizer.from_pretrained(“./”) transformers.AutoModel.from_pretrained(“./”)
Actually, it is the second line that generates the following errors:
Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “/anaconda/lib/python3.7/site-packages/transformers/tokenization_utils.py”, line 587, in save_pretrained return vocab_files + (special_tokens_map_file, added_tokens_file) TypeError: unsupported operand type(s) for +: ‘NoneType’ and ‘tuple’
- The vocabulary size of xlm-roberta is wrong, so it failed with the following code, (this bug also exist in Version 2.4.1):
import transformers tokenizer = transformers.AutoTokenizer.from_pretrained(“xlm-roberta-base”) tokenizer.convert_ids_to_tokens(range(tokenizer.vocab_size))
The error is actually caused by the wrong vocab size:
[libprotobuf FATAL /sentencepiece/src/…/third_party/protobuf-lite/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_): terminate called after throwing an instance of ‘google::protobuf::FatalException’ what(): CHECK failed: (index) < (current_size_): zsh: abort python
Issue Analytics
- State:
- Created 4 years ago
- Comments:15 (4 by maintainers)
Hi! Indeed, there have been a few issues as this was the first release incorporating
tokenizers
by default. A new version oftokenizers
andtransformers
will be available either today or tomorrow and should fix most of these.I cannot answer that, I don’t know what the roadmap looks like.