question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error with STS dataloader

See original GitHub issue

I’m getting an error when I try to run training_stsbenchmark_bilstm.py example

I try to understand what’s in the dataloader: next(iter(train_dataloader)) I get

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class ‘sentence_transformers.readers.InputExample.InputExample’>

Whole message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-97295ce92ad5> in <module>
----> 1 for b in train_dataloader:
      2     print(b)
      3     break

~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    515             if self._sampler_iter is None:
    516                 self._reset()
--> 517             data = self._next_data()
    518             self._num_yielded += 1
    519             if self._dataset_kind == _DatasetKind.Iterable and \

~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    555     def _next_data(self):
    556         index = self._next_index()  # may raise StopIteration
--> 557         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    558         if self._pin_memory:
    559             data = _utils.pin_memory.pin_memory(data)

~/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     45         else:
     46             data = self.dataset[possibly_batched_index]
---> 47         return self.collate_fn(data)

~/.local/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     83         return [default_collate(samples) for samples in transposed]
     84 
---> 85     raise TypeError(default_collate_err_msg_format.format(elem_type))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'sentence_transformers.readers.InputExample.InputExample'>

Running the whole script

I get the following


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-19-1f07fcdd242f> in <module>
      1 logging.info("Warmup-steps: {}".format(warmup_steps))
      2 # Train the model
----> 3 model.fit(train_objectives=[(train_dataloader, train_loss)],
      4           evaluator=evaluator,
      5           epochs=num_epochs,

~/.local/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py in fit(self, train_objectives, evaluator, epochs, steps_per_epoch, scheduler, warmup_steps, optimizer_class, optimizer_params, weight_decay, evaluation_steps, output_path, save_best_model, max_grad_norm, use_amp, callback, show_progress_bar, checkpoint_path, checkpoint_save_steps, checkpoint_save_total_limit)
    703                         skip_scheduler = scaler.get_scale() != scale_before_step
    704                     else:
--> 705                         loss_value = loss_model(features, labels)
    706                         loss_value.backward()
    707                         torch.nn.utils.clip_grad_norm_(loss_model.parameters(), max_grad_norm)

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/sentence_transformers/losses/CosineSimilarityLoss.py in forward(self, sentence_features, labels)
     37 
     38     def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
---> 39         embeddings = [self.model(sentence_feature)['sentence_embedding'] for sentence_feature in sentence_features]
     40         output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))
     41         return self.loss_fct(output, labels.view(-1))

~/.local/lib/python3.8/site-packages/sentence_transformers/losses/CosineSimilarityLoss.py in <listcomp>(.0)
     37 
     38     def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
---> 39         embeddings = [self.model(sentence_feature)['sentence_embedding'] for sentence_feature in sentence_features]
     40         output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))
     41         return self.loss_fct(output, labels.view(-1))

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
    117     def forward(self, input):
    118         for module in self:
--> 119             input = module(input)
    120         return input
    121 

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/sentence_transformers/models/LSTM.py in forward(self, features)
     30         sentence_lengths = torch.clamp(features['sentence_lengths'], min=1)
     31 
---> 32         packed = nn.utils.rnn.pack_padded_sequence(token_embeddings, sentence_lengths, batch_first=True, enforce_sorted=False)
     33         packed = self.encoder(packed)
     34         unpack = nn.utils.rnn.pad_packed_sequence(packed[0], batch_first=True)[0]

~/.local/lib/python3.8/site-packages/torch/nn/utils/rnn.py in pack_padded_sequence(input, lengths, batch_first, enforce_sorted)
    243 
    244     data, batch_sizes = \
--> 245         _VF._pack_padded_sequence(input, lengths, batch_first)
    246     return _packed_sequence_init(data, batch_sizes, sorted_indices, None)
    247 

RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Environment:

sentence-transformers==2.1.0 torch==1.8.0+cu111

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
ncoop57commented, Jan 10, 2022

@milmin and @lambdaofgod I had a similar issue with the dataloader. The issue is that SentenceTransformer overwrites the default collator (https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/SentenceTransformer.py#L629) to work with the InputExample class. So, that is why you are getting the error. If you want to test that your data pipeline is working you will need to import the smart_batching_collate function and set it as your Dataloader’s collate_fn

0reactions
Lurrobertcommented, Jul 15, 2022

smart_batching_collate

Thanks! That helped!

working code:

    train_dataset,
    shuffle=True,
    batch_size=train_batch_size,
    collate_fn=model.smart_batching_collate
)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Error 'Failed to send request' or unable to login with Data Loader
Users who attempt to connect to Salesforce via Data Loader login may receive an error: Failed to send request. This may be resolved...
Read more >
Troubleshoot Errors - Amazon SageMaker
Access Denied when calling sts:AssumeRole. Any 400 error when calling Amazon S3 to download or upload a client model. PassRole error.
Read more >
JNI error has occurred. Please check your installation
I'm trying to run a very simple test using TestNG, but I'm getting this error message. I tried reading the previous answers, but...
Read more >
STS status code table - Veritas
h, and are useful in translating STS errors within environments that use OST (ie. Data Domain) and other storage server components (such as ......
Read more >
Tasks failing with "Forced expiration" error - dataloader.io
This error means that the task ran for more than 2hs, which is the timeout for any dataloader.io task. Cause. This is working...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found