RagRetriever.from_pretrained doesn't get another cache_dir.
See original GitHub issueEnvironment info
transformers
version: 3.3.1- Platform: Linux-4.19
- Python version: 3.7.7
- PyTorch version (GPU?): 1.6.0
- Tensorflow version (GPU?): No
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help
Information
Model I am using RAG:
The problem arises when using:
- the official example scripts: (give details below)
To reproduce
Steps to reproduce the behavior:
- Open notebook
- Run the example code changing the ‘TRANSFORMERS_CACHE’ path to place the dataset in another place than the default one
import os
os.environ['TRANSFORMERS_CACHE'] = '/workspace/notebooks/POCs/cache'
from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration
tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq") # Here the data is placed in the expected path /workspace...
retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", use_dummy_dataset=False) # The dataset is placed in the default place /root/.cache/huggingface/datasets/wiki_dpr/psgs_w100.nq.no_index/0.0.0/
Expected behavior
RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", use_dummy_dataset=False)
should place the data in the expected patch ‘/workspace/notebooks/POCs/cache’
I tried with as well with: retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", chache_dir='/workspace/notebooks/POCs/cache' use_dummy_dataset=False)
but it doesn’t work neither.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
How to change huggingface transformers default cache directory
You can specify the cache directory everytime you load a model with .from_pretrained by the setting the parameter cache_dir .
Read more >RAG - Hugging Face
We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of...
Read more >How to change huggingface transformers default cache directory
I'm writing this answer because there are other Hugging Face cache directories that also eat space in the home directory besides the model...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @lhoestq ,
HF_DATASETS_CACHE works fine:
sure, I’ll post the other issue in the datasets repo.
Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.