question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

i was trying to create custom tokenizer for some language and got this as error or warning..

See original GitHub issue

System Info

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 11 files to the new cache system
0%
0/11 [00:02<?, ?it/s]
There was a problem when trying to move your cache:

  File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1127, in <module>
    move_cache()

  File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1090, in move_cache
    move_to_new_cache(

  File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1047, in move_to_new_cache
    huggingface_hub.file_download._create_relative_symlink(blob_path, pointer_path)

  File "C:\Users\shiva\anaconda3\lib\site-packages\huggingface_hub\file_download.py", line 841, in _create_relative_symlink
    raise OSError(


(Please file an issue at https://github.com/huggingface/transformers/issues/new/choose and copy paste this whole message and we will do our best to help.)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

#save pretrained model from transformers import PreTrainedTokenizerFast

load the tokenizer in a transformers tokenizer instance

tokenizer = PreTrainedTokenizerFast( tokenizer_object=tokenizer, unk_token=‘[UNK]’, pad_token=‘[PAD]’, cls_token=‘[CLS]’, sep_token=‘[SEP]’, mask_token=‘[MASK]’ )

save the tokenizer

tokenizer.save_pretrained(‘bert-base-dv-hi’)

Expected behavior

print out this 
('bert-base-dv-hi\\tokenizer_config.json',
 'bert-base-dv-hi\\special_tokens_map.json',
 'bert-base-dv-hi\\tokenizer.json')

Checklist

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ebolamcommented, Sep 19, 2022

For note, you do not need developer mode for WSL. I’m having the same problem and having to turn on developer mode will kill some of our user base. The warning will intimidate people away from using it.

0reactions
qwead520commented, Nov 23, 2022

I think the issue has been solved on the huggingface_hub side, as long as you use the latest version. Please let us know otherwise!

I am using the latest version of Huggingface-hub(0.11.0), but still facing the same issue.

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
0it [00:00, ?it/s]
There was a problem when trying to write in your cache folder (./tmp/). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
TRANSFORMERS_CACHE = ./tmp/
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
0it [00:00, ?it/s]
There was a problem when trying to write in your cache folder (./tmp/). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Tokenizer - Hugging Face
The decoded sentence. Converts a sequence of ids in a string, using the tokenizer and vocabulary with options to remove special tokens and...
Read more >
Tokenization Errors and Warnings
When performing tokenization (by either Tokenization in Request (Gateway), Tokenization in Response or Universal Tokenization), there may be situations ...
Read more >
Custom Tokenization (Search Developer's Guide)
Use custom tokenizer overrides on fields to change the classification of characters in content and query text. Character re-classification affects searches ...
Read more >
Language Processing Pipelines · spaCy Usage Documentation
When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in...
Read more >
nltk.tokenize package
Caution : when tokenizing a Unicode string, make sure you are not using an encoded version of the string (it may be necessary...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found