Error in HuggingFace Course "Fine-tuning a pretrained model"
See original GitHub issueNew to huggingface and just going through your newly posted course.
To reproduce
Open a google collab notebook.
Run
!pip install transformers[sentencepiece]
!pip install datasets
Then follow the steps in this chapter of the huggingface course https://huggingface.co/course/chapter3/3?fw=pt
At the step where you are told to call trainer.train()
you see this error
***** Running training *****
Num examples = 3668
Num Epochs = 3
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 8
Gradient Accumulation steps = 1
Total optimization steps = 1377
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in convert_to_tensors(self, tensor_type, prepend_batch_axis)
698 if not is_tensor(value):
--> 699 tensor = as_tensor(value)
700
ValueError: too many dimensions 'str'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
8 frames
<ipython-input-50-3435b262f1ae> in <module>()
----> 1 trainer.train()
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
1241 self.control = self.callback_handler.on_epoch_begin(args, self.state, self.control)
1242
-> 1243 for step, inputs in enumerate(epoch_iterator):
1244
1245 # Skip past any already trained steps if resuming training
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
519 if self._sampler_iter is None:
520 self._reset()
--> 521 data = self._next_data()
522 self._num_yielded += 1
523 if self._dataset_kind == _DatasetKind.Iterable and \
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
559 def _next_data(self):
560 index = self._next_index() # may raise StopIteration
--> 561 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
562 if self._pin_memory:
563 data = _utils.pin_memory.pin_memory(data)
/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
45 else:
46 data = self.dataset[possibly_batched_index]
---> 47 return self.collate_fn(data)
/usr/local/lib/python3.7/dist-packages/transformers/data/data_collator.py in __call__(self, features)
121 max_length=self.max_length,
122 pad_to_multiple_of=self.pad_to_multiple_of,
--> 123 return_tensors="pt",
124 )
125 if "label" in batch:
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in pad(self, encoded_inputs, padding, max_length, pad_to_multiple_of, return_attention_mask, return_tensors, verbose)
2700 batch_outputs[key].append(value)
2701
-> 2702 return BatchEncoding(batch_outputs, tensor_type=return_tensors)
2703
2704 def create_token_type_ids_from_sequences(
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in __init__(self, data, encoding, tensor_type, prepend_batch_axis, n_sequences)
202 self._n_sequences = n_sequences
203
--> 204 self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
205
206 @property
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in convert_to_tensors(self, tensor_type, prepend_batch_axis)
714 )
715 raise ValueError(
--> 716 "Unable to create tensor, you should probably activate truncation and/or padding "
717 "with 'padding=True' 'truncation=True' to have batched tensors with the same length."
718 )
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
Expected behavior
I guess I expected it to start training? The error message seems incorrect since padding and truncation are already set to True.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
What to do when you get an error - Hugging Face Course
In this section we'll look at some common errors that can occur when you're trying to generate predictions from your freshly tuned Transformer...
Read more >Fine-tune a pretrained model - Hugging Face
You will see a warning about some of the pretrained weights not being used and some weights being randomly initialized. Don't worry, this...
Read more >Error while training a custom pretrained model - Beginners
Hi,. I trained a model as follows: checkpoint = “bert-base-uncased” tokenizer = AutoTokenizer.from_pretrained(checkpoint)
Read more >Fine-tuning a pretrained model - Hugging Face
In this tutorial, we will show you how to fine-tune a pretrained model from the Transformers library. In TensorFlow, models can be directly...
Read more >DataCollatorWithPadding: TypeError - Hugging Face Forums
Hi, I am following the course. I am now at Fine-tuning Fine-tuning a pretrained model - Hugging Face Course.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My apologies for the spurious issue. I did a factory reset of my runtime today and was unable to replicate the error. Thanks for the awesome library!
@sgugger
This is the ouput
i am trying to use ‘bert-base-cased’ for text classificationn, and i got the same error as OP