question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use ESPnet as a library, the acc doesn't improve.

See original GitHub issue

Describe the bug Hi, thanks for the awesome work!
However, when I tried to implement the Commonvoice ASR by using ESPnet as a library, I met some problems.

The problem was that I used the transformer and RNN model to train the ASR by following the config in the Commonvoice egs.

But during the training process, the loss was decreasing while the training and dev acc was not increasing.

Basic environments:

  • OS information: ubuntu 18.04
  • python version: 3.7.10
  • ESPnet version: 0.9.9
  • Pytorch version: 1.4.0

Task information:

  • Task: ASR
  • Recipe: Commonvoice_zh_TW
  • ESPnet1

Code: I followed the code in the espnet/notebook/asr_library.ipynb and modify some of them. The following is the code I implemented:

import json
import matplotlib.pyplot as plt
import kaldiio
import argparse
from espnet.bin.asr_train import get_parser
from espnet.nets.pytorch_backend.e2e_asr import E2E

with open("./dump/train_zh_TW/deltafalse/data_unigram2500.json", "r") as f:
    train_json = json.load(f)["utts"]
with open("./dump/dev_zh_TW/deltafalse/data_unigram2500.json", "r") as f:
    dev_json = json.load(f)["utts"]

parser = get_parser()
parser = E2E.add_arguments(parser)
config = parser.parse_args([
    "--config","conf/tuning/train_rnn.yaml",
    "--preprocess-conf","conf/specaug.yaml",
    "--ngpu","1",
    "--backend","pytorch",
    "--outdir","exp/python_library/results",
    "--tensorboard-dir", "tensorboard2/python_library",
    "--debugmode","1",
    "--dict", "data/zh-TW_lang_char/train_zh_TW_unigram2500_units.txt",
    "--debugdir","exp/python_library/",
    "--minibatches","0",
    "--verbose","0",
    "--resume","",
    "--train-json","dump/train_zh_TW/deltafalse/data_unigram2500.json",
    "--valid-json","dump/dev_zh_TW/deltafalse/data_unigram2500.json",
])

from espnet.utils.training.batchfy import make_batchset
use_sortagrad = config.sortagrad == -1 or config.sortagrad > 0
batch_size = config.batch_size
trainset = make_batchset(train_json, 
                         batch_size, 
                         config.maxlen_in, 
                         config.maxlen_out, 
                         config.minibatches,
                         min_batch_size=config.ngpu if config.ngpu > 1 else 1,
                         shortest_first=use_sortagrad,
                         count=config.batch_count,
                         batch_bins=config.batch_bins,
                         batch_frames_in=config.batch_frames_in,
                         batch_frames_out=config.batch_frames_out,
                         batch_frames_inout=config.batch_frames_inout,
                         iaxis=0,
                         oaxis=0,                
                        )
devset = make_batchset(dev_json, 
                       batch_size,
                       config.maxlen_in,
                       config.maxlen_out,
                       config.minibatches,
                       min_batch_size=config.ngpu if config.ngpu > 1 else 1,
                       count=config.batch_count,
                       batch_bins=config.batch_bins,
                       batch_frames_in=config.batch_frames_in,
                       batch_frames_out=config.batch_frames_out,
                       batch_frames_inout=config.batch_frames_inout,
                       iaxis=0,
                       oaxis=0)

with open("./exp/train_zh_TW_pytorch_train_transformer_specaug/results/model.json", "r") as f:
    char_list_json = json.load(f)
idim = info["input"][0]["shape"][1]
odim = info["output"][0]["shape"][1]
setattr(config, "char_list", char_list_json[2]['char_list'])
model = E2E(idim, odim, config)

import numpy
import torch
from torch.nn.utils.rnn import pad_sequence
from torch.nn.utils.clip_grad import clip_grad_norm_
from torch.utils.data import DataLoader
from espnet.nets.pytorch_backend.transformer.optimizer import get_std_opt

def collate(minibatch):
    fbanks = []
    tokens = []
    for key, info in minibatch[0]:
        fbanks.append(torch.tensor(kaldiio.load_mat(info["input"][0]["feat"])))
        tokens.append(torch.tensor([int(s) for s in info["output"][0]["tokenid"].split()]))
    ilens = torch.tensor([x.shape[0] for x in fbanks])
    return pad_sequence(fbanks, batch_first=True), ilens, pad_sequence(tokens, batch_first=True)

train_loader = DataLoader(trainset, collate_fn=collate, shuffle=True, pin_memory=True)
dev_loader = DataLoader(devset, collate_fn=collate, pin_memory=True)
model.cuda()
model_params = model.parameters()
optim = torch.optim.Adadelta(
            model_params, rho=0.95, eps=config.eps, weight_decay=config.weight_decay
        )

n_iter = len(trainset)
n_epoch = 15
total_iter = n_iter * n_epoch
train_acc = []
valid_acc = []

data = next(iter(train_loader))

for epoch in range(n_epoch):
    # training
    acc = []
    model.train()
    for data in train_loader:
        loss = model(*[d.cuda() for d in data])
        optim.zero_grad()
        loss.backward()
        acc.append(model.acc)
        norm = clip_grad_norm_(model.parameters(), 5.0)
        optim.step()
        t_r = loss.item()
    train_acc.append(numpy.mean(acc))
    # validation
    acc = []
    model.eval()
    for data in dev_loader:
        model(*[d.cuda() for d in data])
        acc.append(model.acc)
    valid_acc.append(numpy.mean(acc))
    print(f"epoch: {epoch}, train acc: {train_acc[-1]:.3f}, dev acc: {valid_acc[-1]:.3f}, loss:{t_r:.3f}")

And the training result is following:

epoch: 0, train acc: 0.352, dev acc: 0.373, loss:297.107
epoch: 1, train acc: 0.352, dev acc: 0.373, loss:362.381
epoch: 2, train acc: 0.352, dev acc: 0.373, loss:569.445
epoch: 3, train acc: 0.352, dev acc: 0.373, loss:606.235
epoch: 4, train acc: 0.352, dev acc: 0.373, loss:476.195
epoch: 5, train acc: 0.352, dev acc: 0.373, loss:278.052
epoch: 6, train acc: 0.352, dev acc: 0.373, loss:590.790
epoch: 7, train acc: 0.352, dev acc: 0.373, loss:375.243
epoch: 8, train acc: 0.352, dev acc: 0.373, loss:610.624
epoch: 9, train acc: 0.352, dev acc: 0.373, loss:444.297
epoch: 10, train acc: 0.352, dev acc: 0.373, loss:443.770
epoch: 11, train acc: 0.352, dev acc: 0.373, loss:548.410
epoch: 12, train acc: 0.352, dev acc: 0.373, loss:820.162
epoch: 13, train acc: 0.352, dev acc: 0.373, loss:295.590
epoch: 14, train acc: 0.352, dev acc: 0.373, loss:544.829

I don’t know why the train acc and dev acc don’t improve.

Thank you for helping me!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
sw005320commented, Dec 22, 2021

worked fine without speed perturbation

I see. Thanks for the clarification. This is very strange. I’ll discuss it.

Given that the speed perturbation issue is fixed, I think your way is very reasonable. We’ll try to help you as much as possible.

0reactions
stale[bot]commented, Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Usage — ESPnet 202211 documentation - GitHub Pages
You'll achieve significant speed improvement by using the GPU decoding ... The CTC mode does not compute the validation accuracy, and the optimum...
Read more >
ESPnet - PyTorch - Model Zoo
ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition, and end-to-end text-to-speech. ESPnet uses chainer and ...
Read more >
ESPnet: End-to-End Speech Processing Toolkit
We use a warp CTC library developed by [12] for both Chainer and PyTorch backends, which yields 5-10% speed improvement in the total...
Read more >
ESPnet: End-to-End Speech Processing Toolkit - arXiv Vanity
We use a warp CTC library developed by [12] for both Chainer and PyTorch backends, which yields 5-10% speed improvement in the total...
Read more >
arXiv:1909.08723v3 [cs.CL] 15 Oct 2019
the deep learning library PyTorch and the popular neural machine ... improve accuracy by computing the loss (i.e., cross entropy here) not ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found