Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

i was trying to create custom tokenizer for some language and got this as error or warning..

See original GitHub issue

System Info

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 11 files to the new cache system
0%
0/11 [00:02<?, ?it/s]
There was a problem when trying to move your cache:

  File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1127, in <module>
    move_cache()

  File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1090, in move_cache
    move_to_new_cache(

  File "C:\Users\shiva\anaconda3\lib\site-packages\transformers\utils\hub.py", line 1047, in move_to_new_cache
    huggingface_hub.file_download._create_relative_symlink(blob_path, pointer_path)

  File "C:\Users\shiva\anaconda3\lib\site-packages\huggingface_hub\file_download.py", line 841, in _create_relative_symlink
    raise OSError(


(Please file an issue at https://github.com/huggingface/transformers/issues/new/choose and copy paste this whole message and we will do our best to help.)

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

#save pretrained model from transformers import PreTrainedTokenizerFast

load the tokenizer in a transformers tokenizer instance

tokenizer = PreTrainedTokenizerFast( tokenizer_object=tokenizer, unk_token=‘[UNK]’, pad_token=‘[PAD]’, cls_token=‘[CLS]’, sep_token=‘[SEP]’, mask_token=‘[MASK]’ )

save the tokenizer

tokenizer.save_pretrained(‘bert-base-dv-hi’)

Expected behavior

print out this 
('bert-base-dv-hi\\tokenizer_config.json',
 'bert-base-dv-hi\\special_tokens_map.json',
 'bert-base-dv-hi\\tokenizer.json')

Checklist

I have read the migration guide in the readme. (pytorch-transformers; pytorch-pretrained-bert)
I checked if a related official extension example runs on my machine.

Issue Analytics

State:
Created a year ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

ebolamcommented, Sep 19, 2022

For note, you do not need developer mode for WSL. I’m having the same problem and having to turn on developer mode will kill some of our user base. The warning will intimidate people away from using it.

0reactions

qwead520commented, Nov 23, 2022

I think the issue has been solved on the huggingface_hub side, as long as you use the latest version. Please let us know otherwise!

I am using the latest version of Huggingface-hub(0.11.0), but still facing the same issue.

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
0it [00:00, ?it/s]
There was a problem when trying to write in your cache folder (./tmp/). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
TRANSFORMERS_CACHE = ./tmp/
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Moving 0 files to the new cache system
0it [00:00, ?it/s]
0it [00:00, ?it/s]
There was a problem when trying to write in your cache folder (./tmp/). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.

Top Results From Across the Web

Tokenizer - Hugging Face

The decoded sentence. Converts a sequence of ids in a string, using the tokenizer and vocabulary with options to remove special tokens and...

Tokenization Errors and Warnings

When performing tokenization (by either Tokenization in Request (Gateway), Tokenization in Response or Universal Tokenization), there may be situations ...

Custom Tokenization (Search Developer's Guide)

Use custom tokenizer overrides on fields to change the classification of characters in content and query text. Character re-classification affects searches ...

Language Processing Pipelines · spaCy Usage Documentation

When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in...

nltk.tokenize package

Caution : when tokenizing a Unicode string, make sure you are not using an encoded version of the string (it may be necessary...