question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Alignments in supervision segments

See original GitHub issue

We should start supporting alignments in Lhotse. I think that the following scheme is expressive enough:


class AlignmentItem(NamedTuple):
  symbol: str
  start: Seconds
  duration: Seconds

SupervisionSegment(
  id='...',
  start=...,
  duration=...,
  text='...',
  ...
  alignment={
    'type': 'word',
    'items': [
      ['word', 0.0, 0.5],
      ['alignment', 0.5, 0.7],
      ...
    ]
  }
)

The types would be entirely user-dependent, e.g. it could be “phones”, “subwords”, “letters”, etc. I’m not sure if any other fields are needed besides start and duration…

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:6

github_iconTop GitHub Comments

1reaction
pzelaskocommented, May 11, 2021

Maybe we should mimic Kaldi’s approach to yield consistent setup with other approaches – anyway, it’s your call.

About the times – good question. I suggest being consistent with the reference point of supervision segment’s start, i.e. by default create them to reference the start of the recording, but when we create cuts / call with_offset or trim / etc. we should also modify AlignmentItem’s appropriately. In cuts, supervisions reference point is always the start of the cut (and they can have a negative start to indicate that it started before the cut).

0reactions
desh2608commented, May 11, 2021

I believe Kaldi creates segments based on punctuation boundaries (at least in the AMI recipe). I already have an option in the Lhotse recipe for specifying max-pause between words to merge them into one supervision segment, so we could just use that.

Are the AlignmentItem start times w.r.t. the SupervisionSegment start?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Alignment Observation and Supervision Checklists
Alignment Observation and Supervision Checklists ... Review progress toward achieving alignment goals. ... Leads segments or trainings.
Read more >
Weakly-Supervised Action Segmentation and Alignment via ...
We address the problem of learning to segment actions from weakly-annotated videos, i.e., videos accompanied by transcripts (ordered list of actions).
Read more >
Video-to-Music Recommendation using Temporal Alignment ...
We study cross-modal recommendation of music tracks to be used as soundtracks for videos. This problem is known as the music supervision ......
Read more >
Rethinking Alignment and Uniformity in Unsupervised Image ...
Abstract: Unsupervised image semantic segmentation(UISS) aims to match low-level visual features with semantic-level representations without ...
Read more >
Unsupervised Alignment of Natural Language Instructions ...
or video segment), our algorithm aims to automatically infer the alignment from the ... natural language expressions without any direct supervision.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found