question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to drop samples larger than `batch_bins` when training (in order to avoid out of memory errors)?

See original GitHub issue

Hello!

Is there a way to have batch_bins specify the maximum number of bins allowed by a batch (in the case of batches of type length or numel)? If I understand correctly, the current implementation groups the samples into a number of bins that are closest, but also greater than batch_bins; I would like (i) the batch size to be capped to at most batch_bins and (ii) if there are any samples larger than batch_bins to have them removed. The motivation is that there are cases when there are a few long audio files in the training dataset and those cause the entire process to run out of memory (especially when working on a small GPU). I assume that I can modify the corresponding batch samples (NumElementsBatchSampler and LengthBatchSampler), but I was wondering if there already exists solution to this type of problem.

Thank you!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
kan-bayashicommented, Mar 19, 2021
1reaction
kamo-naoyukicommented, Mar 18, 2021

Thanks. I’ll consider it.

It doesn’t exist for sampler side now. If you’ll remove long samples, or samples having the other problems, from your training data, you can delete the lines of them from shape text directly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Out of memory when training on custom dataset #188 - GitHub
I was trying to train a retinanet model on some custom dataset (e.g. WIDER face) and I've encountered consistent out of memory issue...
Read more >
Batch size and GPU memory limitations in neural networks
In this article, we'll talk about batch sizing issues one may encounter while training neural networks using large batch sizes and being limited ......
Read more >
Training a BERT-based model causes an OutOfMemory error ...
So, looking at the error the problem is not being able to allocate an array of [786432,1604] . If you do a simple...
Read more >
Memory considerations – Machine Learning on GPU
When it comes to memory usage, there are two main things to consider: the size of your training data and the size of...
Read more >
Arabic Speech Recognition by End-to-End, Modular Systems ...
Existence of different Arabic dialects with limited labeled data. Each dialect is a native Arabic language that is spoken, but not written, as ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found