DynamicBatchSampler
See original GitHub issueHi,
I’m trying to understand the implementation of DynamicBatchSampler
.
I would like to know, in _get_boundaries_through_warping
, why would you use the lognorm of s=1
to get the quantiles and linearly scale up to max_batch_length
. What about using lognorm.fit
?
cc: @popcornell
Thanks in advance !
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:10 (5 by maintainers)
Top Results From Across the Web
speechbrain.dataio.sampler module - Read the Docs
Dynamic batching is performed by specifying a max_batch_length which is the upper limit for the sum of the length of examples in a...
Read more >[feature request] [PyTorch] Dynamic Samplers. · Issue #7359
I am trying to implement Dynamic Samplers in Pytorch. ... idx in enumerate(iter(self.sampler)): batch = idx yield batch if len(batch) > 0 ...
Read more >Creating a dynamic sampler - fastai dev - fast.ai Course Forums
Hi everyone! For some project of mine, I am currently trying to create a dynamic sampler for my data loader, i.e. a sampler...
Read more >Efficient Dynamic Batching of Large Datasets with Infinibatch
We will explore how to efficiently batch large datasets with varied sequence length for training using infinibatch.
Read more >torch.utils.data — PyTorch 1.13 documentation
A custom Sampler that yields a list of batch indices at a time can be passed as the batch_sampler argument. Automatic batching can...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @bofenghuang yeah, that warping idea is on me. It’s resolving time resolution in the latent statistical space rather than needing to define it explicitly for every audio collection that one fancies at a time. I write the following bulk just to make sure we are on the same page - from your message I assume you know it intuitively already, just so others can follow when reading this when archived.
Please take a look at our tutorial for context. The prequel to your question:
Now, how to get these buckets with their fancy properties? Without the latent space/warping approach, one would need to define: 0 to 0.2; 0.2 to 0.6; 0.6 to 1.2; … or to script some exponential growth there. Rationale for exponential growth: most datasets are log-normal distributed when it comes to audio duration. So, the buckets are projected to treatment in a linear space. Therefore, the log-normal distribution is used.
About your question. Which log-normal distribution to use is perhaps somewhat arbitrary. 😉
Here, the goal was to get an initial handle on:
max_batch_length
which represents the VRAM limitnum_quantiles
which represents the targeted resolution in the latent spaceFor this latent space, your question is about:
The answer might be depressingly simple: to get the PR & tutorial out for later discussions like this one.
My gut feeling is that a distribution fit should play out better than some arbitrarily assumed distribution. Would you be up to dive into tests on this topic? It w/could make sense also to move away then from the log-normal assumption and use a general fit - in the end, what matters here is that quantiles become linear in their handling through warping distributions. Another issue might also rest in: for the rest of the dynamic batching, why not have simply three or five bucket types and sort the rest in from long to short audios?
@bofenghuang neat! It’s encouraging to see your enthusiasm 😄 Any choice is arbitrary, having some facts selection to it, doesn’t make choices systematic 😉
@popcornell worked a lot on the final tutorial!
As you demonstrated, the num_buckets helps DynamicBatching to do something but it is far from what one would think should happen (10% of the created buckets are actually used, the rest remains unfilled). Therefore, I’d be curious if it makes sense to have any distributional assumption at all - or to treat this entirely on the categorical level (have 4, 5, 6, … bucket types to be filled whatever they are - the limit is VRAM). What I try to say:
That’s also what your findings on kmeans support: fitting distributions can help but in the end, we get a dataset and that’s our entire population to take care about during operation. (New dataset, new task - everything back to start.) Yet, what does the batch creation of DynamicBatching imply to the overall training? How to test that for one and many datasets - how do we have the guarantees that we need to have?
If the number of total batches created is small and padding is small - how much random permutation of the files in these batches is then possible? Or would there only be one solution in how to draw a batch after kmeans?
What’s your take: is there benefit in giving many choices here on which DynamicBatching approach to use? Are there “relevant” ones? // The goal is to provide useful tools which make a complex issue to be handled intuitively.
% True samples
&% of padding
say the same thing -% of padding
could be more useful to develop intuition here but it’s not what one thinks of first (so both were in the tutorial). About theSampler initialization time
- it’s relevant to decompose theTotal time
to understand what’s going on internally.DynamicBatching has a few use cases which need better testing if they are fulfilled:
@TParcollet @popcornell please add to that list what I forgot - there might be more requirements to DynamicBatching ^^"
The dimensions to investigate a better DynamicBatching need to also work on:
What we observe on small datasets might not hold on large datasets. How about discussing next week a strategy for testing and developing DynamicBatching further?