Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BucketIterator iterating more than the length

See original GitHub issue

I have code like this:

batch_size = 100

train_dataset, test_dataset = convos.split(split_ratio=0.7)

train_iterator = torchtext.data.BucketIterator(
    train_dataset,
    batch_size=3000,
    sort_key=lambda x: torchtext.data.interleave_keys(len(x.context), len(x.response)),
    device=device
)

print("Batch size: ", batch_size)
print("Train size: ", len(train_iterator))

Batch size:  100
Train size:  1

Following this, I do

for count, batch in enumerate(train_iterator):
    print(count)

And this program doesn’t stop (at least I checked until count = 3000). I suppose it should iterate just once!

Issue Analytics

State:
Created 5 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

4reactions

mttkcommented, Sep 25, 2018

Yeah, I believe it is more or less agreed that the default infinite loop in Iterators is confusing.

I’d say that the expected behavior is to iterate for one epoch (so, the default would be repeat=False), and leave the infinite loop as an option if users prefer it. This might break something backwards but it’s manageable.

0reactions

zhangguanheng66commented, May 30, 2019

I will just close this issue since a PR is attached. Thanks all for the help.