Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AMI diarization

See original GitHub issue

I’m trying to follow the recipe for speaker diarization on the AMI dataset (https://github.com/speechbrain/speechbrain/tree/develop/recipes/AMI/Diarization) but unfortunately without success. Here’s the output:

...
speechbrain.utils.parameter_transfer - Loading pretrained files for: embedding_model, mean_var_norm_emb
__main__ - Tuning for p-value for SC (Multiple iterations over AMI Dev set)
__main__ - Diarizing dev set
__main__ - No recording IDs found! Please check if meta_data json file is properly generated.

I have downloaded the data and set the variables in the config files accordingly, i.e.:

data_folder: .../AMI/amicorpus/
manual_annot_folder: .../AMI/ami_public_manual_1.6.2

where amicorpus looks as follows: amicorpus/EN2009d/audio/EN2009d.Mix-Headset.wav

I’m running this using device: 'cpu'

I checked the results/…/metadata folder and I see that ami_dev.Mix-Headset.subsegs.json and eval.Mix-Headset.subsegs.json are empty, while ami_train.Mix-Headset.subsegs.json contains a dict of elemts like

"EN2009d_0.0_2.99": {
    "wav": {
      "file": "/Users/jonas/Desktop/Translated/ASR/datasets/AMI/amicorpus//EN2009d/audio/EN2009d.Mix-Headset.wav",
      "duration": 2.99,
      "start": 0,
      "stop": 47840
    }
  }

I would really appreciate some help! Am I missing anything?

Issue Analytics

State:
Created 2 years ago
Comments:14

Top GitHub Comments

1reaction

nauman-dawcommented, Oct 25, 2021

Hi,

Yes. “No recording IDs found! Please check if meta_data json file is properly generated.” should be related to improper paths. Please check “<filename>.subsegs.json” as this will be by your experiment.py.

@LONG520520 please feel free to open a PR for this, I will check it. Even if the PR suggests some useful points on how to avoid these path errors, it will be very helpful for others.

thank you very much!

0reactions

nauman-dawcommented, Jan 17, 2022

Closing this path issue for now.

Top Results From Across the Web

BUTSpeechFIT/AMI-diarization-setup - GitHub

AMI -diarization-setup · All words are considered as speech and included in the references. · Speaker turns respect precisely the annotations, but adjacent...

AMI Corpus

The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. For a gentle introduction to the corpus,...

ami · Datasets at Hugging Face

The AMI Meeting Corpus consists of 100 hours of meeting recordings. ... speaker-diarization : The dataset can be used to train model for...

AMI Benchmark (Speaker Diarization) - Papers With Code

The current state-of-the-art on AMI is pyannote (waveform). See a full comparison of 2 papers with ... Speaker Diarization on AMI. Leaderboard; Dataset....

The AMI speaker diarization system for NIST RT06s meeting ...

Abstract. We describe the systems submitted to the NIST RT06s eval- uation for the Speech Activity Detection (SAD) and Speaker Diarization. (SPKR) tasks....