question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"BucketingSampler does not support working with lazy CutSet" when running icefall recipes

See original GitHub issue

This commit https://github.com/lhotse-speech/lhotse/commit/0dceff169c9af0c70c8eda1266640a85409617e9 seems to break running icefall/egs/librispeech/ASR/*/train.py.

I now get the ValueError raised (“BucketingSampler does not support working with lazy CutSet”) when running: python3 ./pruned_transducer_stateless2/train.py --exp-dir=./pruned_transducer_stateless2/exp --world-size 1 --num-epochs 26 --full-libri 1 --max-duration 300.

I am using the librispeech datasets which are prepared in icefall and I have not modified anything.

@pzelasko whats the best way forward since I believe you added this raise condition? Thanks!

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
jtrmalcommented, May 20, 2022

my understanding is that jsonl is json but formatted in such a way that it’s one record per line. there might be a finer definition, but I have so far survived with this 😃 y.

On Fri, May 20, 2022 at 10:14 AM John Hughes @.***> wrote:

Thanks everyone, could you let me know when you’ve added a fix to icefall? For now I will use DynamicBucketingSampler. Also, newbie question - what is a jsonl vs json?

— Reply to this email directly, view it on GitHub https://github.com/lhotse-speech/lhotse/issues/721#issuecomment-1132948283, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYX4OGRQ6XM2C734XRETVK6M3JANCNFSM5WLZ5EFQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

2reactions
csukuangfjcommented, May 19, 2022

I’ll try to find some time to update Icefall, would be good to get some feedback from @csukuangfj and @danpovey which option they prefer; my recommendation is DynamicBucketingSampler.

Both are fine for me. Shall we replace all “.json.gz" in icefall with ".jsonl.gz”

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues · lhotse-speech/lhotse · GitHub
"BucketingSampler does not support working with lazy CutSet" when running icefall recipes. #721 opened May 19, 2022 by McHughes288.
Read more >
[WIP] add wenetspeech recipe #167 - GitHub
commands to run WenetSpeech recipe are : cd icefall/egs/wenetspeech/ASR && . ... Can be useful when handling large, lazy manifests where it is...
Read more >
lhotse's documentation!
CutSet supports lazy data augmentation/transformation methods which require adjusting ... Item doesn't exist yet - run extra work to prepare the manifest.
Read more >
Lhotse - arXiv
Lhotse provides a common JSON description format with corresponding Python classes and data preparation recipes for over 30 popular speech ...
Read more >
online-deployment issue "The Managed Inference service creation ...
ExecuteCommand not committing data, 2, 2022-03-24, 2022-05-15. "BucketingSampler does not support working with lazy CutSet" when running icefall recipes ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found