Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IEMOCAP recipe - missing lengths in call to embedding module ?

See original GitHub issue

Hello Speechbrain team,

While trying to reproduce the results mentioned in https://github.com/speechbrain/speechbrain/tree/develop/recipes/IEMOCAP I figured out that the accuracy on the test set varies w.r.t the batch size in the test set dataloader, sometimes reducing the accuracy by up to 7% when batch_size = 1.

Further investigations in the code made me think that the ‘lengths’ parameter might be missing in the call to the embedding module (train.py line 37).

I tried to change it to: embeddings = self.modules.embedding_model(feats, lengths=lens) and ran the train and test stages again. Then the variations are less important but are still present : around 1% difference from one test to another (train stage done only once).

So I have 2 questions:

Do you think the modification hereabove is correct?
What could explain the remaining fluctuations with respect to the batch size of the test set?

Thank you in advance

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

1reaction

wikongcommented, Oct 18, 2021

Hi @aheba @TParcollet @mravanelli Thank you for the fix, and also for the additional work with wav2vec2. Very interesting indeed…

1reaction

mravanellicommented, Oct 16, 2021

I think the correct way is to pass the lenghts parameter to the embedding model (this way zero padded elements are removed from the statistical pooling operation). @aheba, can we do it in your ongoing PR?

Top Results From Across the Web

Releases · pytorch/audio - GitHub

This is a minor release, which is compatible with PyTorch 1.13.1 and includes bug fixes, improvements and documentation updates.

Robust Methods for the Automatic Quantification and ...

Confusion matrices obtained using speaker embeddings in the cross- corpus setting when (a) training on IEMOCAP and testing on MSP-.

Learning Alignment for Multimodal Emotion Recognition from ...

Each utterance in the IEMOCAP dataset is labeled by three annotators, and we assign a single category to each utterance by majority vote....

Cross-view Learning with Limited Supervision

Abstract. Real-world data is often multi-view, with each view representing a different perspective of the data. These views can be different modalities, ...

Recent Advances in Deep Learning Based Dialogue Systems

This is due to the fixed input length and limited convolution span of CNNs. ... an 'I' module which maps the input memory...