question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

XLM-R tokenizer is none

See original GitHub issue

Environment info

  • transformers version: 4.3.2
  • Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.6.9
  • PyTorch version (GPU?): 1.7.0+cu101 (True)
  • Tensorflow version (GPU?): 2.4.1 (True)

Who can help

@LysandreJik @n1t0

Information

I am using XLM-R:

The problem arises when using:

  • the official example scripts: (give details below)

The tasks I am working on is:

  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behaviour:

tokenizer = XLMRobertaTokenizer.from_pretrained('xlm-roberta-base')
model = XLMRobertaModel.from_pretrained('xlm-roberta-base')
print(tokenizer, model)

Result

The xlm-r tokenizer is none but the model can be found.

I am a beginner for this model. Many thanks for your help.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
LysandreJikcommented, May 3, 2021

This is probably because you hadn’t restarted your kernel after installing the sentencepiece dependency!

0reactions
chandreshiitcommented, May 3, 2021

Not sure how but it’s working today. image

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tokenizer - Hugging Face
A tokenizer is in charge of preparing the inputs for a model. ... If no value is provided, will default to VERY_LARGE_INTEGER (...
Read more >
SST-2 Binary text classification with XLM-RoBERTa model
A standard way to process text is: Tokenize text. Convert tokens into (integer) IDs. Add any special tokens IDs. XLM-R uses ...
Read more >
xlm-r-large tokenize dataset - Kaggle
This kernel tokenizes the whole (train+test) dataset ahead of time and saves it in npy file format for later loading in order to...
Read more >
Source code for comet.models.encoders.xlmr
def __init__( self, xlmr: XLMRModel, tokenizer: XLMRTextEncoder, ... by removing the LM and classification heads # xlmr.model.decoder.lm_head.dense = None ...
Read more >
Notes on Transformers Book Ch. 4 - Christian Mills
Non -English pretrained models typically exist only for languages like ... The tokenizer model analyzes the training corpus to find the most ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found