Alignments in supervision segments
See original GitHub issueWe should start supporting alignments in Lhotse. I think that the following scheme is expressive enough:
class AlignmentItem(NamedTuple):
symbol: str
start: Seconds
duration: Seconds
SupervisionSegment(
id='...',
start=...,
duration=...,
text='...',
...
alignment={
'type': 'word',
'items': [
['word', 0.0, 0.5],
['alignment', 0.5, 0.7],
...
]
}
)
The types would be entirely user-dependent, e.g. it could be “phones”, “subwords”, “letters”, etc. I’m not sure if any other fields are needed besides start and duration…
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:6
Top Results From Across the Web
Alignment Observation and Supervision Checklists
Alignment Observation and Supervision Checklists ... Review progress toward achieving alignment goals. ... Leads segments or trainings.
Read more >Weakly-Supervised Action Segmentation and Alignment via ...
We address the problem of learning to segment actions from weakly-annotated videos, i.e., videos accompanied by transcripts (ordered list of actions).
Read more >Video-to-Music Recommendation using Temporal Alignment ...
We study cross-modal recommendation of music tracks to be used as soundtracks for videos. This problem is known as the music supervision ......
Read more >Rethinking Alignment and Uniformity in Unsupervised Image ...
Abstract: Unsupervised image semantic segmentation(UISS) aims to match low-level visual features with semantic-level representations without ...
Read more >Unsupervised Alignment of Natural Language Instructions ...
or video segment), our algorithm aims to automatically infer the alignment from the ... natural language expressions without any direct supervision.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Maybe we should mimic Kaldi’s approach to yield consistent setup with other approaches – anyway, it’s your call.
About the times – good question. I suggest being consistent with the reference point of supervision segment’s
start
, i.e. by default create them to reference the start of the recording, but when we create cuts / callwith_offset
ortrim
/ etc. we should also modifyAlignmentItem
’s appropriately. In cuts, supervisions reference point is always the start of the cut (and they can have a negative start to indicate that it started before the cut).I believe Kaldi creates segments based on punctuation boundaries (at least in the AMI recipe). I already have an option in the Lhotse recipe for specifying
max-pause
between words to merge them into one supervision segment, so we could just use that.Are the
AlignmentItem
start times w.r.t. theSupervisionSegment
start?