question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add bucket sampler

See original GitHub issue

🚀 Feature

Motivation

The legacy BucketIterator was convenient because it could batch samples by length to minimize padding. It had many disadvantages because of its API and non-comformance with other parts of the pytorch data{sets,loader} ecosystem. It would be nice if torchtext supported the spirit of the BucketIterator by way of a Sampler.

Pitch

A sampler with the ability to specify maximum bucket size should be added similar to those in torchnlp and allennlp. This can be used with existing datasets in torchtext but as a kwarg to the pytorch DataLoader so sampling minimizes padding.

Alternatives

Users who want this functionality need to implement their own samplers.

Additional context

The migration guide contains a prototype of this feature without it being a first-class part of the torchtext repo. A proposed implementation can be found here.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
eripcommented, Jan 4, 2022

I’d definitely be interested in contributing!

1reaction
eripcommented, Jan 5, 2022

Yeah this shuffling will generate different batches

Ah, I misread the code – this makes total sense. I agree that this will have more impact than shuffling within a batch. I still think it is good to give an option to the user about whether there should be shuffling at all. maybe just shuffle: bool = True by default?

Does it mean the samples whose length

Yes, exactly. See here. There’s probably some error checking to add here (what if lengths is the empty list after filtering? 🙀 ), but otherwise seems OK.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bucket Image Sampler - V-Ray for 3ds Max - Chaos Docs
This page provides information on the Bucket Image Sampler ... Then, the color of samples is compared and more are added where needed...
Read more >
bucket_batch_sampler - AllenNLP v2.10.1
An sampler which by default, argsorts batches with respect to the maximum input lengths per batch . You can provide a list of...
Read more >
0402 Bucket Image Sampler - YouTube
0402 Bucket Image Sampler. 1.6K views 4 years ago. 黑色太阳 ... Vray HDRI - How to add Hdri lighting in Vray 5 for...
Read more >
Sampler aggregation | Elasticsearch Guide [8.5] | Elastic
Sampler aggregationedit. A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents. Example use cases:.
Read more >
Compacted Soil Sampler Bucket
Replacement Parts, Hand Tools, Sampler Bucket, Soil Sampling. ... Add to Wish List ... 3 Gallon Compacted Soil Sampler Bucket that is drilled....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found