Error with STS dataloader
See original GitHub issueI’m getting an error when I try to run training_stsbenchmark_bilstm.py example
I try to understand what’s in the dataloader:
next(iter(train_dataloader))
I get
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class ‘sentence_transformers.readers.InputExample.InputExample’>
Whole message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-97295ce92ad5> in <module>
----> 1 for b in train_dataloader:
2 print(b)
3 break
~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
515 if self._sampler_iter is None:
516 self._reset()
--> 517 data = self._next_data()
518 self._num_yielded += 1
519 if self._dataset_kind == _DatasetKind.Iterable and \
~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
555 def _next_data(self):
556 index = self._next_index() # may raise StopIteration
--> 557 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
558 if self._pin_memory:
559 data = _utils.pin_memory.pin_memory(data)
~/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
45 else:
46 data = self.dataset[possibly_batched_index]
---> 47 return self.collate_fn(data)
~/.local/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
83 return [default_collate(samples) for samples in transposed]
84
---> 85 raise TypeError(default_collate_err_msg_format.format(elem_type))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'sentence_transformers.readers.InputExample.InputExample'>
Running the whole script
I get the following
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-19-1f07fcdd242f> in <module>
1 logging.info("Warmup-steps: {}".format(warmup_steps))
2 # Train the model
----> 3 model.fit(train_objectives=[(train_dataloader, train_loss)],
4 evaluator=evaluator,
5 epochs=num_epochs,
~/.local/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py in fit(self, train_objectives, evaluator, epochs, steps_per_epoch, scheduler, warmup_steps, optimizer_class, optimizer_params, weight_decay, evaluation_steps, output_path, save_best_model, max_grad_norm, use_amp, callback, show_progress_bar, checkpoint_path, checkpoint_save_steps, checkpoint_save_total_limit)
703 skip_scheduler = scaler.get_scale() != scale_before_step
704 else:
--> 705 loss_value = loss_model(features, labels)
706 loss_value.backward()
707 torch.nn.utils.clip_grad_norm_(loss_model.parameters(), max_grad_norm)
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/.local/lib/python3.8/site-packages/sentence_transformers/losses/CosineSimilarityLoss.py in forward(self, sentence_features, labels)
37
38 def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
---> 39 embeddings = [self.model(sentence_feature)['sentence_embedding'] for sentence_feature in sentence_features]
40 output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))
41 return self.loss_fct(output, labels.view(-1))
~/.local/lib/python3.8/site-packages/sentence_transformers/losses/CosineSimilarityLoss.py in <listcomp>(.0)
37
38 def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
---> 39 embeddings = [self.model(sentence_feature)['sentence_embedding'] for sentence_feature in sentence_features]
40 output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))
41 return self.loss_fct(output, labels.view(-1))
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/.local/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
117 def forward(self, input):
118 for module in self:
--> 119 input = module(input)
120 return input
121
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
~/.local/lib/python3.8/site-packages/sentence_transformers/models/LSTM.py in forward(self, features)
30 sentence_lengths = torch.clamp(features['sentence_lengths'], min=1)
31
---> 32 packed = nn.utils.rnn.pack_padded_sequence(token_embeddings, sentence_lengths, batch_first=True, enforce_sorted=False)
33 packed = self.encoder(packed)
34 unpack = nn.utils.rnn.pad_packed_sequence(packed[0], batch_first=True)[0]
~/.local/lib/python3.8/site-packages/torch/nn/utils/rnn.py in pack_padded_sequence(input, lengths, batch_first, enforce_sorted)
243
244 data, batch_sizes = \
--> 245 _VF._pack_padded_sequence(input, lengths, batch_first)
246 return _packed_sequence_init(data, batch_sizes, sorted_indices, None)
247
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
Environment:
sentence-transformers==2.1.0 torch==1.8.0+cu111
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Error 'Failed to send request' or unable to login with Data Loader
Users who attempt to connect to Salesforce via Data Loader login may receive an error: Failed to send request. This may be resolved...
Read more >Troubleshoot Errors - Amazon SageMaker
Access Denied when calling sts:AssumeRole. Any 400 error when calling Amazon S3 to download or upload a client model. PassRole error.
Read more >JNI error has occurred. Please check your installation
I'm trying to run a very simple test using TestNG, but I'm getting this error message. I tried reading the previous answers, but...
Read more >STS status code table - Veritas
h, and are useful in translating STS errors within environments that use OST (ie. Data Domain) and other storage server components (such as ......
Read more >Tasks failing with "Forced expiration" error - dataloader.io
This error means that the task ran for more than 2hs, which is the timeout for any dataloader.io task. Cause. This is working...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@milmin and @lambdaofgod I had a similar issue with the dataloader. The issue is that SentenceTransformer overwrites the default collator (https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/SentenceTransformer.py#L629) to work with the
InputExample
class. So, that is why you are getting the error. If you want to test that your data pipeline is working you will need to import thesmart_batching_collate
function and set it as your Dataloader’s collate_fnThanks! That helped!
working code: