question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Need help with instructions to reproduce experiments

See original GitHub issue

Hi Hiro!

First, thank you for the repo. I’ve been following for a while and I saw you implement a big number of dl architectures.

So far I was only watching the repo from time to time, but now I would like to see If I can reproduce some results and eventually use it with custom datasets. I tried to reproduce librispeech experiment without success and need some help with it.

I went ahead and follow the installation instructions:

# Set path to CUDA, NCCL
CUDAROOT=/usr/local/cuda
NCCL_ROOT=/usr/local/nccl

export CPATH=$NCCL_ROOT/include:$CPATH
export LD_LIBRARY_PATH=$NCCL_ROOT/lib/:$CUDAROOT/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$NCCL_ROOT/lib/:$LIBRARY_PATH
export CUDA_HOME=$CUDAROOT
export CUDA_PATH=$CUDAROOT
export CPATH=$CUDA_PATH/include:$CPATH  # for warp-rnnt

# Install miniconda, python libraries, and other tools
cd tools
make 

Kaldi complained about a few libraries but after installing them manually the make command run successfully. After this a conda environment was created under my path: /mnt/kingston/github/neural_sp/tools/miniconda. I activatated it with conda activate /mnt/kingston/github/neural_sp/tools/miniconda and proceeded to run

cd examples/librispeech/s5/
sh run.sh

But got the following output:

============================================================================
                                LibriSpeech                               
============================================================================
run.sh: 14: ./path.sh: source: not found
run.sh: 34: utils/parse_options.sh: Syntax error: Bad for loop variable

Have I missed an important part of the installation process? Do you have a more detailed list of steps I should follow in order to reproduce? Any help would be very much appreciated thanks.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
jiwidicommented, Dec 30, 2020

I found this PR on the repo with support for compute 30 https://github.com/HawkAaron/warp-transducer/pull/76, will give it a try and come back

EDIT: Managed to compile it with the branch at https://github.com/ncilfone/warp-transducer/tree/3691b3fa5483e911645738a7894c48fe1f116c9b.

Also discovered I couldnt run the run.sh script with sh run.sh since it will get the same error:

============================================================================
                                LibriSpeech                               
============================================================================
run.sh: 14: ./path.sh: source: not found
run.sh: 34: utils/parse_options.sh: Syntax error: Bad for loop variable

It has to be run with ./run.sh --gpu 1 . This downloads all the data and does some preprocessing it stops during the data prep, just stops the script with no error.

It fails on data_prep.sh:

    for part in dev-clean test-clean dev-other test-other train-clean-100 train-clean-360 train-other-500; do
        # use underscore-separated names in data directories.
        local/data_prep.sh ${data_download_path}/LibriSpeech/${part} ${data}/$(echo ${part} | sed s/-/_/g) || exit 1;
    done

Specifically on utils/validate_data_dir.sh --no-feats $dst || exit 1;

But doesnt give any specific output or complains, the full run.sh output:

============================================================================
                                LibriSpeech                               
============================================================================
============================================================================
                       Data Preparation (stage:0)                          
============================================================================
local/download_and_untar.sh: data part dev-clean was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part test-clean was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part dev-other was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part test-other was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part train-clean-100 was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part train-clean-360 was already successfully extracted, nothing to do.
local/download_and_untar.sh: data part train-other-500 was already successfully extracted, nothing to do.
Downloading file '3-gram.arpa.gz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'3-gram.arpa.gz' already exists and appears to be complete
Downloading file '3-gram.pruned.1e-7.arpa.gz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'3-gram.pruned.1e-7.arpa.gz' already exists and appears to be complete
Downloading file '3-gram.pruned.3e-7.arpa.gz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'3-gram.pruned.3e-7.arpa.gz' already exists and appears to be complete
Downloading file '4-gram.arpa.gz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'4-gram.arpa.gz' already exists and appears to be complete
Downloading file 'g2p-model-5' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'g2p-model-5' already exists and appears to be complete
Downloading file 'librispeech-lm-corpus.tgz' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'librispeech-lm-corpus.tgz' already exists and appears to be complete
Downloading file 'librispeech-vocab.txt' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'librispeech-vocab.txt' already exists and appears to be complete
Downloading file 'librispeech-lexicon.txt' into '/mnt/kingston/asr-datasets/neural-sp//local/lm'...
'librispeech-lexicon.txt' already exists and appears to be complete
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file headers, using wav-to-duration
utils/data/get_utt2dur.sh: computed /mnt/kingston/asr-datasets/neural-sp//dev_clean/utt2dur
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require 
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be 
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be 
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train
1reaction
hirofumi0810commented, Dec 28, 2020

@jiwidi I’ll fix Makefile. Please retry it after the next PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Q&A: 5 simple ways to make your research more reproducible
What are your five steps for ensuring reproducible research? David Sholl: Step 1. Show error bars on your data.
Read more >
5 Ways to Make Your Experiments More Reproducible
We need to be sure that the methodological design itself can be replicated using the information available. Consider this; a review of a ......
Read more >
Improving Reproducibility and Replicability - NCBI - NIH
This chapter describes current and proposed efforts to improve reproducibility and replicability—or to reduce unhelpful sources of non-replicability.
Read more >
Having hard times reproducing your experiments? - Westburg
Let's not be naive: there is little or no reward for just reproducing the work of other scientists, so there is also no...
Read more >
How can I make my experiment replicable? - FAQ 2186
A scientific experiment is replicable if it can be repeated with the same ... For more information on p values, see advice on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found