i was trying to create custom tokenizer for some language and got this as error or warning..
See original GitHub issueSystem Info
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 11 files to the new cache system
0%
0/11 [00:02<?, ?it/s]
There was a problem when trying to move your cache:
File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1127, in <module>
move_cache()
File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1090, in move_cache
move_to_new_cache(
File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1047, in move_to_new_cache
huggingface_hub.file_download._create_relative_symlink(blob_path, pointer_path)
File "C:\Users\shiva\anaconda3\lib\site-packages\huggingface_hub\file_download.py", line 841, in _create_relative_symlink
raise OSError(
(Please file an issue at https://github.com/huggingface/transformers/issues/new/choose and copy paste this whole message and we will do our best to help.)
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
#save pretrained model from transformers import PreTrainedTokenizerFast
load the tokenizer in a transformers tokenizer instance
tokenizer = PreTrainedTokenizerFast( tokenizer_object=tokenizer, unk_token=‘[UNK]’, pad_token=‘[PAD]’, cls_token=‘[CLS]’, sep_token=‘[SEP]’, mask_token=‘[MASK]’ )
save the tokenizer
tokenizer.save_pretrained(‘bert-base-dv-hi’)
Expected behavior
print out this
('bert-base-dv-hi\\tokenizer_config.json',
'bert-base-dv-hi\\special_tokens_map.json',
'bert-base-dv-hi\\tokenizer.json')
Checklist
- I have read the migration guide in the readme. (pytorch-transformers; pytorch-pretrained-bert)
- I checked if a related official extension example runs on my machine.
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Tokenizer - Hugging Face
The decoded sentence. Converts a sequence of ids in a string, using the tokenizer and vocabulary with options to remove special tokens and...
Read more >Tokenization Errors and Warnings
When performing tokenization (by either Tokenization in Request (Gateway), Tokenization in Response or Universal Tokenization), there may be situations ...
Read more >Custom Tokenization (Search Developer's Guide)
Use custom tokenizer overrides on fields to change the classification of characters in content and query text. Character re-classification affects searches ...
Read more >Language Processing Pipelines · spaCy Usage Documentation
When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in...
Read more >nltk.tokenize package
Caution : when tokenizing a Unicode string, make sure you are not using an encoded version of the string (it may be necessary...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
For note, you do not need developer mode for WSL. I’m having the same problem and having to turn on developer mode will kill some of our user base. The warning will intimidate people away from using it.
I am using the latest version of Huggingface-hub(0.11.0), but still facing the same issue.