question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get stuck in DynamicBucketing Sampler

See original GitHub issue

I am refactoring icefall to use lazy cutset with dynamic bucketing sampler everywhere in https://github.com/k2-fsa/icefall/pull/397

For the following command run in the librispeech recipe directory using the above PR:

  ./pruned_transducer_stateless3/train.py \
    --world-size 1 \
    --num-epochs 30 \
    --start-epoch 0 \
    --exp-dir pruned_transducer_stateless3/exp \
    --full-libri 0 \
    --max-duration 100 \
    --giga-prob 0.2

the training process seems to get stuck inside the dynamic bucketing sampler.

The log output is Screen Shot 2022-06-05 at 23 26 18

I am using py-spy to find where it gets stuck:

watch -n 0.5 py-spy dump --pid 308949 --native

The output is

https://user-images.githubusercontent.com/5284924/172058000-fa268eca-5139-4edd-982f-7c0de189bb55.mov

10 minutes have passed but nothing changes.

PS: I am using the latest master of lhotse.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
danpoveycommented, Jun 6, 2022

That was with an older Lhotse. With the latest version of Lhotse, I have verified that the number of frames does not depend on the number of workers, it is 944034.00. From lhotse cut describe data/fbank/cuts_dev-clean.json.gz , and same for dev-other, it seems the total duration of valid set is 5.4+5.1 hours; multiplying by 3600 seconds per hour and 25 frames per second, that should give 945000 frames. So the length seems correct; I suppose what may have been happening before is the BucketingSampler may have been discarding some utterances.

0reactions
pzelaskocommented, Jun 6, 2022

That’s possible, I recall merging 2 PRs fairly recently that were fixing some data loss in BucketingSampler.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PyTorch Datasets - Sampler - lhotse's documentation!
Another strategy — used in BucketingSampler — will first group the cuts of similar durations into buckets, and then randomly select a bucket...
Read more >
Can stuck bucket syndrome finally be fixed, please?
When it hits that time, just give the result of whatever noise sampling it got to. Dynamic splitting each bucket after its already...
Read more >
How to optimize buckets in V-Ray part 1 - Option analysis
... what is bucket image sampler ? 03:04 - testing bucket sizes performance 03:37 - why bigger buckets render faster? 04:43 - dynamic...
Read more >
Efficient Dynamic Batching of Large Datasets with Infinibatch
We will explore how to efficiently batch large datasets with varied sequence length for training using infinibatch. The focus will be on ...
Read more >
Online Assessment Interview/Hackathon Question, Solution ...
Level up your coding skills and quickly land a job. This is the best place to expand your knowledge and get prepared for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found