Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error with STS dataloader

See original GitHub issue

I’m getting an error when I try to run training_stsbenchmark_bilstm.py example

I try to understand what’s in the dataloader: next(iter(train_dataloader)) I get

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class ‘sentence_transformers.readers.InputExample.InputExample’>

Whole message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-97295ce92ad5> in <module>
----> 1 for b in train_dataloader:
      2     print(b)
      3     break

~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    515             if self._sampler_iter is None:
    516                 self._reset()
--> 517             data = self._next_data()
    518             self._num_yielded += 1
    519             if self._dataset_kind == _DatasetKind.Iterable and \

~/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    555     def _next_data(self):
    556         index = self._next_index()  # may raise StopIteration
--> 557         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    558         if self._pin_memory:
    559             data = _utils.pin_memory.pin_memory(data)

~/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     45         else:
     46             data = self.dataset[possibly_batched_index]
---> 47         return self.collate_fn(data)

~/.local/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     83         return [default_collate(samples) for samples in transposed]
     84 
---> 85     raise TypeError(default_collate_err_msg_format.format(elem_type))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'sentence_transformers.readers.InputExample.InputExample'>

Running the whole script

I get the following


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-19-1f07fcdd242f> in <module>
      1 logging.info("Warmup-steps: {}".format(warmup_steps))
      2 # Train the model
----> 3 model.fit(train_objectives=[(train_dataloader, train_loss)],
      4           evaluator=evaluator,
      5           epochs=num_epochs,

~/.local/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py in fit(self, train_objectives, evaluator, epochs, steps_per_epoch, scheduler, warmup_steps, optimizer_class, optimizer_params, weight_decay, evaluation_steps, output_path, save_best_model, max_grad_norm, use_amp, callback, show_progress_bar, checkpoint_path, checkpoint_save_steps, checkpoint_save_total_limit)
    703                         skip_scheduler = scaler.get_scale() != scale_before_step
    704                     else:
--> 705                         loss_value = loss_model(features, labels)
    706                         loss_value.backward()
    707                         torch.nn.utils.clip_grad_norm_(loss_model.parameters(), max_grad_norm)

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/sentence_transformers/losses/CosineSimilarityLoss.py in forward(self, sentence_features, labels)
     37 
     38     def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
---> 39         embeddings = [self.model(sentence_feature)['sentence_embedding'] for sentence_feature in sentence_features]
     40         output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))
     41         return self.loss_fct(output, labels.view(-1))

~/.local/lib/python3.8/site-packages/sentence_transformers/losses/CosineSimilarityLoss.py in <listcomp>(.0)
     37 
     38     def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
---> 39         embeddings = [self.model(sentence_feature)['sentence_embedding'] for sentence_feature in sentence_features]
     40         output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))
     41         return self.loss_fct(output, labels.view(-1))

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
    117     def forward(self, input):
    118         for module in self:
--> 119             input = module(input)
    120         return input
    121 

~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/.local/lib/python3.8/site-packages/sentence_transformers/models/LSTM.py in forward(self, features)
     30         sentence_lengths = torch.clamp(features['sentence_lengths'], min=1)
     31 
---> 32         packed = nn.utils.rnn.pack_padded_sequence(token_embeddings, sentence_lengths, batch_first=True, enforce_sorted=False)
     33         packed = self.encoder(packed)
     34         unpack = nn.utils.rnn.pad_packed_sequence(packed[0], batch_first=True)[0]

~/.local/lib/python3.8/site-packages/torch/nn/utils/rnn.py in pack_padded_sequence(input, lengths, batch_first, enforce_sorted)
    243 
    244     data, batch_sizes = \
--> 245         _VF._pack_padded_sequence(input, lengths, batch_first)
    246     return _packed_sequence_init(data, batch_sizes, sorted_indices, None)
    247 

RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

Environment:

sentence-transformers==2.1.0 torch==1.8.0+cu111

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

ncoop57commented, Jan 10, 2022

@milmin and @lambdaofgod I had a similar issue with the dataloader. The issue is that SentenceTransformer overwrites the default collator (https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/SentenceTransformer.py#L629) to work with the InputExample class. So, that is why you are getting the error. If you want to test that your data pipeline is working you will need to import the smart_batching_collate function and set it as your Dataloader’s collate_fn

0reactions

Lurrobertcommented, Jul 15, 2022

smart_batching_collate

Thanks! That helped!

working code:

    train_dataset,
    shuffle=True,
    batch_size=train_batch_size,
    collate_fn=model.smart_batching_collate
)

Top Results From Across the Web

Error 'Failed to send request' or unable to login with Data Loader

Users who attempt to connect to Salesforce via Data Loader login may receive an error: Failed to send request. This may be resolved...

Troubleshoot Errors - Amazon SageMaker

Access Denied when calling sts:AssumeRole. Any 400 error when calling Amazon S3 to download or upload a client model. PassRole error.

JNI error has occurred. Please check your installation

I'm trying to run a very simple test using TestNG, but I'm getting this error message. I tried reading the previous answers, but...

STS status code table - Veritas

h, and are useful in translating STS errors within environments that use OST (ie. Data Domain) and other storage server components (such as ......

Tasks failing with "Forced expiration" error - dataloader.io

This error means that the task ran for more than 2hs, which is the timeout for any dataloader.io task. Cause. This is working...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Error with STS dataloader

Running the whole script

Environment:

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[CLIP] 'clip-ViT-B-32' can we not change the max_seq_lenght?

CPU