RobertaTokenizer object has no attribute 'add_special_tokens_single_sentence'
See original GitHub issueIn trying to test out the roberta model I received this error. My setup is the same as in the Fine Tune Model section of the readme.
transformers==2.0.0 fast-bert==1.4.2
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-17-c876b1d42fd6> in <module>
7 multi_gpu=args.multi_gpu,
8 model_type=args.model_type,
----> 9 logger=logger)
~/.conda/envs/transclass/lib/python3.7/site-packages/fast_bert/data_lm.py in from_raw_corpus(data_dir, text_list, tokenizer, batch_size_per_gpu, max_seq_length, multi_gpu, test_size, model_type, logger, clear_cache, no_cache)
152 model_type=model_type,
153 logger=logger,
--> 154 clear_cache=clear_cache, no_cache=no_cache)
155
156 def __init__(self, data_dir, tokenizer, train_file='lm_train.txt', val_file='lm_val.txt',
~/.conda/envs/transclass/lib/python3.7/site-packages/fast_bert/data_lm.py in __init__(self, data_dir, tokenizer, train_file, val_file, batch_size_per_gpu, max_seq_length, multi_gpu, model_type, logger, clear_cache, no_cache)
209 train_filepath = str(self.data_dir/train_file)
210 train_dataset = TextDataset(self.tokenizer, train_filepath, cached_features_file,
--> 211 self.logger, block_size=self.tokenizer.max_len_single_sentence)
212
213 self.train_batch_size = self.batch_size_per_gpu * \
~/.conda/envs/transclass/lib/python3.7/site-packages/fast_bert/data_lm.py in __init__(self, tokenizer, file_path, cache_path, logger, block_size)
104
105 while len(tokenized_text) >= block_size: # Truncate in block of block_size
--> 106 self.examples.append(tokenizer.add_special_tokens_single_sentence(
107 tokenized_text[:block_size]))
108 tokenized_text = tokenized_text[block_size:]
AttributeError: 'RobertaTokenizer' object has no attribute 'add_special_tokens_single_sentence'
It appears that the RobertaTokenizer has attributes:
add_special_tokens
add_special_tokens_sequence_pair
add_special_tokens_single_sequence
add_tokens
But not add_special_tokens_single_sentence
.
It seems this method is quite similar to add_special_tokens_single_sequence
, and perhaps that is the intended method.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6
Top Results From Across the Web
AttributeError: 'RobertaTokenizer' object has no attribute ...
With the latest update to Transformers, has the function been removed? I still see it in the code, but I run into the...
Read more >Error training MLM with Roberta Tokenizer
I am currently trying to train a MLM using a ByteLevelBPETokenizer on a custom corpus and am getting the following error: AttributeError: ...
Read more >Loading a tokenizer on huggingface: AttributeError
There seems to be some issue with the tokenizer. It works, if you remove use_fast parameter or set it true, then you will...
Read more >RoBERTa_Bert_tokenizer_train_...
Id)): location = f'{path}{rec_id}.json' with open(location, ... *inputs, **kwargs): AttributeError: 'RobertaTokenizerFast' object has no attribute 'to'.
Read more >Training RoBERTa from scratch - the missing guide
We will be dealing not only with long wiki articles but later on ... BertProcessing logger.info("Loading RoBERTa tokenizer") tokenizer ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is broke again despite of #102 since Huggingface has made a breaking change on the Transformers repo (see https://github.com/huggingface/transformers/commit/6c1d0bc0665ef01710db301fb1a0a3c23778714a)
The fix is, again, replacing the
add_special_tokens_single_sequence
method withbuild_inputs_with_special_tokens
I’mgetting this same error while using a number of other tokenizer (including training my own tokenizers from the huggingface tokenizers library (BertWordPieceTokenizer , SentencePieceBPETokenizer & ByteLevelBPETokenizer)