Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LJspeech recipe error: exceeded the bounds of its corresponding recording

See original GitHub issue

The prepare_ljspeech recipe on version lhotse==0.4.0 fails with the following error:

Traceback (most recent call last):
  File "examples/", line 9, in <module>
    ljspeech = prepare_ljspeech("LJSpeech-1.1", "manifests")
  File "/home/jan/aligntts/venv/lib/python3.8/site-packages/lhotse/recipes/", line 96, in prepare_ljspeech
    validate_recordings_and_supervisions(recording_set, supervision_set)
  File "/home/jan/aligntts/venv/lib/python3.8/site-packages/lhotse/", line 52, in validate_recordings_and_supervisions
    assert 0 <= s.start <= s.end <= r.duration, \
AssertionError: Supervision LJ001-0001: exceeded the bounds of its corresponding recording (supervision spans [0.0, 9.65501134]; recording spans [0, 9.65501133786848])

It runs fine on lhotse==0.3.0. There seems to be related a issue #201 - there is also a rounding problem. I am looking into it now, but not sure where is the problem.

Edit: It seems like the duration of a Recording does not do the same rounding that happens in Supervision.


   def end(self) -> Seconds:
       return round(self.start + self.duration, ndigits=8)


            id=recording_id if recording_id is not None else sph_path.stem,
            duration=sphf.format['sample_count'] / sphf.format['sample_rate'],
                        if relative_path_depth is not None and relative_path_depth > 0
                        else str(sph_path)

Is it possible that there should also be rounding in the Recording? I cam make a PR for this 😃

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

pzelaskocommented, Feb 26, 2021

I thought about it but I haven’t analyzed yet how that would impact serialization/deserialization, overall speed and memory load (especially with a lot of objects around), and user experience. It could be a long-term solution though.

pzelaskocommented, Feb 26, 2021

Yeah, as you can see this float rounding stuff is really delicate. Your suggestion sounds good to me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Command-line interface — lhotse documentation
Fix a pair of Lhotse RECORDINGS and SUPERVISIONS manifests. It removes supervisions without corresponding recordings and vice versa, trims the supervisions that ...
Read more >
The LJ Speech Dataset - Keith Ito
A public domain speech dataset consisting of 13100 short audio clips of a single speaker reading passages from 7 non-fiction books.
Read more >
Bad results when training Transformer TTS from EGS2 ...
I'm experimenting with training LJSpeech Transformer TTS from the EGS2 recipe in the repo. I'm using a custom dataset based on audios from ......
Read more >
Proceedings of the 13th Language Resources and Evaluation ...
the Limits of Transfer Learning with a Unified Text- to-Text Transformer. ... mon cases where more than one tag corresponds to a.
Read more >
Proceedings of 1st Joint SLTU and CCURL Workshop (SLTU ...
and their corresponding phonemic featurizations are pro- ... audio for both languages was recorded at 48 kHz. The.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found