Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add bucket sampler

See original GitHub issue

🚀 Feature

Motivation

The legacy BucketIterator was convenient because it could batch samples by length to minimize padding. It had many disadvantages because of its API and non-comformance with other parts of the pytorch data{sets,loader} ecosystem. It would be nice if torchtext supported the spirit of the BucketIterator by way of a Sampler.

Pitch

A sampler with the ability to specify maximum bucket size should be added similar to those in torchnlp and allennlp. This can be used with existing datasets in torchtext but as a kwarg to the pytorch DataLoader so sampling minimizes padding.

Alternatives

Users who want this functionality need to implement their own samplers.

Additional context

The migration guide contains a prototype of this feature without it being a first-class part of the torchtext repo. A proposed implementation can be found here.

Issue Analytics

State:
Created 2 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

2reactions

eripcommented, Jan 4, 2022

I’d definitely be interested in contributing!

1reaction

eripcommented, Jan 5, 2022

Yeah this shuffling will generate different batches

Ah, I misread the code – this makes total sense. I agree that this will have more impact than shuffling within a batch. I still think it is good to give an option to the user about whether there should be shuffling at all. maybe just shuffle: bool = True by default?

Does it mean the samples whose length

Yes, exactly. See here. There’s probably some error checking to add here (what if lengths is the empty list after filtering? 🙀 ), but otherwise seems OK.

Top Results From Across the Web

Bucket Image Sampler - V-Ray for 3ds Max - Chaos Docs

This page provides information on the Bucket Image Sampler ... Then, the color of samples is compared and more are added where needed...

bucket_batch_sampler - AllenNLP v2.10.1

An sampler which by default, argsorts batches with respect to the maximum input lengths per batch . You can provide a list of...

0402 Bucket Image Sampler - YouTube

0402 Bucket Image Sampler. 1.6K views 4 years ago. 黑色太阳 ... Vray HDRI - How to add Hdri lighting in Vray 5 for...

Sampler aggregation | Elasticsearch Guide [8.5] | Elastic

Sampler aggregationedit. A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents. Example use cases:.

Compacted Soil Sampler Bucket

Replacement Parts, Hand Tools, Sampler Bucket, Soil Sampling. ... Add to Wish List ... 3 Gallon Compacted Soil Sampler Bucket that is drilled....