question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot determine `batch_size` from a list of string while running `range_test()` with `val_loader`

See original GitHub issue

Hey @davidtvs, this issue is found while I was writing an example for utilizing this package with huggingface/transformers for #55 .

Condition

  • Input data: list of string (Dataset returns string)
  • Running range_test() with val_loader

Error message

---> 10 lr_finder.range_test(train_loader, val_loader=valid_loader, start_lr=1e-5, end_lr=10, num_iter=100, step_mode='linear')

1 frames

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in range_test(self, train_loader, val_loader, start_lr, end_lr, num_iter, step_mode, smooth_f, diverge_th, accumulation_steps, non_blocking_transfer)
    288             if val_loader:
    289                 loss = self._validate(
--> 290                     val_iter, non_blocking_transfer=non_blocking_transfer
    291                 )
    292 

/usr/local/lib/python3.6/dist-packages/torch_lr_finder/lr_finder.py in _validate(self, val_iter, non_blocking_transfer)
    398 
    399                 if isinstance(inputs, tuple) or isinstance(inputs, list):
--> 400                     batch_size = inputs[0].size(0)
    401                 else:
    402                     batch_size = inputs.size(0)

AttributeError: 'str' object has no attribute 'size'

Description

In current implementation, batch_size is determined dynamically according to the shape of inputs in LRFinder._validate(). (v0.2.0) L399-L402 will work normally only when given inputs is a torch.tensor. And that’s why it failed when inputs is a list of string.

Maybe it’s not a usual case that Dataset returns non-torch.tensor values, but I think it would be more easier to access it from DataLoader.batch_size since it’s going to iterate a val_loader in LRFinder._validate().

Hence that I proposed a fix for this in that notebook, it’s simply add a line batch_size = val_iter.data_loader.batch_size before entering the loop and remove those if-else statement, you can check it out here.

But I’m having doubts about adding a property batch_size in DataLoaderIter, e.g.

class DataLoaderIter(object):
    # ...
    @property
    def batch_size(self):
        return self.data_loader.batch_size

With this property, proposed fix can be simplified a little into this:

class LRFinder(object):
    def _validate(self, val_iter, non_blocking_transfer=True):
        # Set model to evaluation mode and disable gradient computation
        running_loss = 0
        self.model.eval()

        with torch.no_grad():
            for inputs, labels in val_iter:
                # Move data to the correct device
                inputs, labels = self._move_to_device(
                    inputs, labels, non_blocking=non_blocking_transfer
                )

                # Forward pass and loss computation
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                running_loss += loss.item() * val_iter.batch_size

        return running_loss / len(val_iter.dataset)

What do you think of it?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
davidtvscommented, Jul 25, 2020

Merged #58. Thanks @NaleRaphael for raising the issue and fixing it

1reaction
davidtvscommented, Jul 22, 2020

The current code is like that so that it can handle last batches that don’t have the same batch size. Your suggestion is cleaner and works if we force drop_last=True for the validation data loader. Ideally, we would not force drop_last=True and still support datasets that return objects that don’t have a size method. I googled a bit and couldn’t find a way that is not overcomplicated.

I’ll try to think about this a bit more and come back tomorrow/next few days. But I think if we can’t find a reasonable way of having both then we should do this change and document that drop_last=True.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is there a common Java utility to break a list into batches?
Check out Lists.partition(java.util.List, int) from Google Guava: Returns consecutive sublists of a list, each of the same size (the final list may be ......
Read more >
Fix `extract_batch_size` for string values and custom batches
Proposed refactoring or deprecation String values and custom batches are not handled correctly while extracting batch sizes.
Read more >
I cannot use List<string> as Model in Razor - Microsoft Q&A
I cannot use List<string> as Model in Razor. Hi Guys I tried to use Razor ... In the VS Code terminal just run...
Read more >
Simple Apex class to return a list of strings
Create an Apex class that returns an array (or list) of formatted strings ('Test 0', 'Test 1', ...). The length of the array...
Read more >
Break up a list into batches with C# .NET
In the first if-block we look at the current value in the iteration and see if the temporary array is smaller than the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found