Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LJspeech recipe error: exceeded the bounds of its corresponding recording

See original GitHub issue

The prepare_ljspeech recipe on version lhotse==0.4.0 fails with the following error:

Traceback (most recent call last):
  File "examples/prepare_ljspeech.py", line 9, in <module>
    ljspeech = prepare_ljspeech("LJSpeech-1.1", "manifests")
  File "/home/jan/aligntts/venv/lib/python3.8/site-packages/lhotse/recipes/ljspeech.py", line 96, in prepare_ljspeech
    validate_recordings_and_supervisions(recording_set, supervision_set)
  File "/home/jan/aligntts/venv/lib/python3.8/site-packages/lhotse/qa.py", line 52, in validate_recordings_and_supervisions
    assert 0 <= s.start <= s.end <= r.duration, \
AssertionError: Supervision LJ001-0001: exceeded the bounds of its corresponding recording (supervision spans [0.0, 9.65501134]; recording spans [0, 9.65501133786848])

It runs fine on lhotse==0.3.0. There seems to be related a issue #201 - there is also a rounding problem. I am looking into it now, but not sure where is the problem.

Edit: It seems like the duration of a Recording does not do the same rounding that happens in Supervision.

Supervision:

   @property
   def end(self) -> Seconds:
       return round(self.start + self.duration, ndigits=8)

Recording:

Recording(
            id=recording_id if recording_id is not None else sph_path.stem,
            sampling_rate=sphf.format['sample_rate'],
            num_samples=sphf.format['sample_count'],
            duration=sphf.format['sample_count'] / sphf.format['sample_rate'],
            sources=[
                AudioSource(
                    type='file',
                    channels=list(range(sphf.format['channel_count'])),
                    source=(
                        '/'.join(sph_path.parts[-relative_path_depth:])
                        if relative_path_depth is not None and relative_path_depth > 0
                        else str(sph_path)
                    )
                )
            ]
        )

Is it possible that there should also be rounding in the Recording? I cam make a PR for this 😃

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

pzelaskocommented, Feb 26, 2021

I thought about it but I haven’t analyzed yet how that would impact serialization/deserialization, overall speed and memory load (especially with a lot of objects around), and user experience. It could be a long-term solution though.

1reaction

pzelaskocommented, Feb 26, 2021

Yeah, as you can see this float rounding stuff is really delicate. Your suggestion sounds good to me.

Top Results From Across the Web

Command-line interface — lhotse 1.12.0.dev documentation

Fix a pair of Lhotse RECORDINGS and SUPERVISIONS manifests. It removes supervisions without corresponding recordings and vice versa, trims the supervisions that ...

The LJ Speech Dataset - Keith Ito

A public domain speech dataset consisting of 13100 short audio clips of a single speaker reading passages from 7 non-fiction books.

Bad results when training Transformer TTS from EGS2 ...

I'm experimenting with training LJSpeech Transformer TTS from the EGS2 recipe in the repo. I'm using a custom dataset based on audios from ......

Proceedings of the 13th Language Resources and Evaluation ...

the Limits of Transfer Learning with a Unified Text- to-Text Transformer. ... mon cases where more than one tag corresponds to a.

Proceedings of 1st Joint SLTU and CCURL Workshop (SLTU ...

and their corresponding phonemic featurizations are pro- ... audio for both languages was recorded at 48 kHz. The.