Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to drop samples larger than `batch_bins` when training (in order to avoid out of memory errors)?

See original GitHub issue

Hello!

Is there a way to have batch_bins specify the maximum number of bins allowed by a batch (in the case of batches of type length or numel)? If I understand correctly, the current implementation groups the samples into a number of bins that are closest, but also greater than batch_bins; I would like (i) the batch size to be capped to at most batch_bins and (ii) if there are any samples larger than batch_bins to have them removed. The motivation is that there are cases when there are a few long audio files in the training dataset and those cause the entire process to run out of memory (especially when working on a small GPU). I assume that I can modify the corresponding batch samples (NumElementsBatchSampler and LengthBatchSampler), but I was wondering if there already exists solution to this type of problem.

Thank you!

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

kan-bayashicommented, Mar 19, 2021

For quick fixing, how about filtering in stage 4? https://github.com/espnet/espnet/blob/96e6eb97bacb428d9979661725657b5c38a17dee/egs2/TEMPLATE/asr1/asr.sh#L527

1reaction

kamo-naoyukicommented, Mar 18, 2021

Thanks. I’ll consider it.

It doesn’t exist for sampler side now. If you’ll remove long samples, or samples having the other problems, from your training data, you can delete the lines of them from shape text directly.

Top Results From Across the Web

Out of memory when training on custom dataset #188 - GitHub

I was trying to train a retinanet model on some custom dataset (e.g. WIDER face) and I've encountered consistent out of memory issue...

Batch size and GPU memory limitations in neural networks

In this article, we'll talk about batch sizing issues one may encounter while training neural networks using large batch sizes and being limited ......

Training a BERT-based model causes an OutOfMemory error ...

So, looking at the error the problem is not being able to allocate an array of [786432,1604] . If you do a simple...

Memory considerations – Machine Learning on GPU

When it comes to memory usage, there are two main things to consider: the size of your training data and the size of...

Arabic Speech Recognition by End-to-End, Modular Systems ...

Existence of different Arabic dialects with limited labeled data. Each dialect is a native Arabic language that is spoken, but not written, as ......