Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

XLM-R tokenizer is none

See original GitHub issue

Environment info

transformers version: 4.3.2
Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
PyTorch version (GPU?): 1.7.0+cu101 (True)
Tensorflow version (GPU?): 2.4.1 (True)

Who can help

@LysandreJik @n1t0

Information

I am using XLM-R:

The problem arises when using:

the official example scripts: (give details below)

The tasks I am working on is:

my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behaviour:

tokenizer = XLMRobertaTokenizer.from_pretrained('xlm-roberta-base')
model = XLMRobertaModel.from_pretrained('xlm-roberta-base')
print(tokenizer, model)

Result

The xlm-r tokenizer is none but the model can be found.

I am a beginner for this model. Many thanks for your help.

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

LysandreJikcommented, May 3, 2021

This is probably because you hadn’t restarted your kernel after installing the sentencepiece dependency!

0reactions

chandreshiitcommented, May 3, 2021

Not sure how but it’s working today.

Top Results From Across the Web

Tokenizer - Hugging Face

A tokenizer is in charge of preparing the inputs for a model. ... If no value is provided, will default to VERY_LARGE_INTEGER (...

SST-2 Binary text classification with XLM-RoBERTa model

A standard way to process text is: Tokenize text. Convert tokens into (integer) IDs. Add any special tokens IDs. XLM-R uses ...

xlm-r-large tokenize dataset - Kaggle

This kernel tokenizes the whole (train+test) dataset ahead of time and saves it in npy file format for later loading in order to...

Source code for comet.models.encoders.xlmr

def __init__( self, xlmr: XLMRModel, tokenizer: XLMRTextEncoder, ... by removing the LM and classification heads # xlmr.model.decoder.lm_head.dense = None ...

Notes on Transformers Book Ch. 4 - Christian Mills

Non -English pretrained models typically exist only for languages like ... The tokenizer model analyzes the training corpus to find the most ......