Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Alignments in supervision segments

See original GitHub issue

We should start supporting alignments in Lhotse. I think that the following scheme is expressive enough:


class AlignmentItem(NamedTuple):
  symbol: str
  start: Seconds
  duration: Seconds

SupervisionSegment(
  id='...',
  start=...,
  duration=...,
  text='...',
  ...
  alignment={
    'type': 'word',
    'items': [
      ['word', 0.0, 0.5],
      ['alignment', 0.5, 0.7],
      ...
    ]
  }
)

The types would be entirely user-dependent, e.g. it could be “phones”, “subwords”, “letters”, etc. I’m not sure if any other fields are needed besides start and duration…

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:6

Top GitHub Comments

1reaction

pzelaskocommented, May 11, 2021

Maybe we should mimic Kaldi’s approach to yield consistent setup with other approaches – anyway, it’s your call.

About the times – good question. I suggest being consistent with the reference point of supervision segment’s start, i.e. by default create them to reference the start of the recording, but when we create cuts / call with_offset or trim / etc. we should also modify AlignmentItem’s appropriately. In cuts, supervisions reference point is always the start of the cut (and they can have a negative start to indicate that it started before the cut).

0reactions

desh2608commented, May 11, 2021

I believe Kaldi creates segments based on punctuation boundaries (at least in the AMI recipe). I already have an option in the Lhotse recipe for specifying max-pause between words to merge them into one supervision segment, so we could just use that.

Are the AlignmentItem start times w.r.t. the SupervisionSegment start?

Top Results From Across the Web

Alignment Observation and Supervision Checklists

Alignment Observation and Supervision Checklists ... Review progress toward achieving alignment goals. ... Leads segments or trainings.

Weakly-Supervised Action Segmentation and Alignment via ...

We address the problem of learning to segment actions from weakly-annotated videos, i.e., videos accompanied by transcripts (ordered list of actions).

Video-to-Music Recommendation using Temporal Alignment ...

We study cross-modal recommendation of music tracks to be used as soundtracks for videos. This problem is known as the music supervision ......

Rethinking Alignment and Uniformity in Unsupervised Image ...

Abstract: Unsupervised image semantic segmentation(UISS) aims to match low-level visual features with semantic-level representations without ...

Unsupervised Alignment of Natural Language Instructions ...

or video segment), our algorithm aims to automatically infer the alignment from the ... natural language expressions without any direct supervision.