save_pretrained doesn't work with GPT2FastTokenizer
See original GitHub issue🐛 Bug
Information
Model I am using (Bert, XLNet …): GPT2TokenizerFast
Language I am using the model on (English, Chinese …): English
The problem arises when using:
- the official example scripts: (give details below)
- [X ] my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- [ X] my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
> from transformers import *
> tok = GPT2TokenizerFast.from_pretrained('distilgpt2')
> tok.save_pretrained('./')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/bilal/Documents/transformers/src/transformers/tokenization_utils.py", line 519, in save_pretrained
vocab_files = self.save_vocabulary(save_directory)
File "/Users/bilal/Documents/transformers/src/transformers/tokenization_utils.py", line 529, in save_vocabulary
raise NotImplementedError
NotImplementedError
Expected behavior
The tokenizer should be able to be saved
Environment info
transformers
version: 2.4.1- Platform: Darwin-19.3.0-x86_64-i386-64bit
- Python version: 3.7.5
- PyTorch version (GPU?): 1.3.1 (False)
- Tensorflow version (GPU?): 2.0.0 (False)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
AutoTokenizer not loading gpt2 model on instance without ...
I am trying to first download and cache the GPT2 Tokenizer to use on an instance that does not have internet connection.
Read more >OpenAI GPT2 - Hugging Face
Construct a “fast” GPT-2 tokenizer (backed by HuggingFace's tokenizers library). Based on byte-level Byte-Pair-Encoding. This tokenizer has been trained to ...
Read more >AutoTokenizer.from_pretrained fails to load locally saved ...
I am using pretrained tokenizers provided by HuggingFace. I am successful in downloading and running them. But if I try to save them...
Read more >Transformers - fastai
In this tutorial, we will see how we can use the fastai library to fine-tune a pretrained transformer model from the transformers library...
Read more >pytorch-pretrained-bert - PyPI
This PyTorch implementation of OpenAI GPT-2 is an adaptation of the OpenAI's ... INFO) # Load pre-trained model tokenizer (vocabulary) tokenizer ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Fixed
Thanks, I’m able to reproduce now, I’ll have a look hopefully tomorrow morning.
I’ll keep you posted here 👀