Toward a stable version

See original GitHub issue

I think we have fixed many issues, and we can add a version 1.0 (or 0.1) as a stable version. Toward that we need to finish

VGG2L for pytorch by @ShigekiKarita
AN4 recipe by me
AMI recipe
swbd recipe
fisher_swbd recipe
LM integration @sw005320
Attention/CTC joint decoding @takaaki-hori
End detection
Documentation by @sw005320 @kan-bayashi
Modify L.embed to avoid the randomness @takaaki-hori
Add WER scoring
label smoothing by @takaaki-hori
replace _ilens_to_index to np.cumsum
refactor main training and recognition to be independent of pytorch and chainer backends.

If you have any action items, please add them in this issue. Then, we can move to more research-related implementation.

Issue Analytics

State:
Created 6 years ago
Comments:26 (24 by maintainers)

Top GitHub Comments

2reactions

sw005320commented, Jan 26, 2018

Guys, by combining LSTMLM and joint attention/CTC decoding, we finally get CER 5.3 -> 3.8, WER 14.7 -> 9.3 in the WSJ task!!! The nice thing is that we don’t have to set min/maxlength and penalty (all set to 0.0), while we might need to tune the CTC and LM weights (0.3 and 1.0, respectively, see #76). @kan-bayashi, can you play with LSTMLM and joint decoding with the TEDLIUM recipe? You can train LSTMLM by using text data by referring tools/kaldi/egs/tedlium/s5_r2/local/ted_train_lm.sh and simply using

gunzip -c db/TEDLIUM_release2/LM/*.en.gz | sed 's/ <\/s>//g' | local/join_suffix.py | gzip -c  > ${dir}/data/text/train.txt.gz

1reaction

kan-bayashicommented, Feb 2, 2018

The results of tedlium with ctc joint decoding and lm rescoring are as follows:

exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_dev_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.txt:|        Sum/Avg                          |         507                 95429        |        91.8                  4.2                  4.0                  2.7                 10.8                 89.3        |
exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.txt:|        Sum/Avg                       |        1155                145066         |        92.2                  3.7                   4.1                  2.4                  10.1                 85.3         |
exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_dev_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.wrd.txt:|        Sum/Avg                           |         507                17783         |        83.2                 13.7                   3.1                  3.0                  19.8                 89.3         |
exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.wrd.txt:|        Sum/Avg                       |        1155                 27500         |        84.0                   12.3                   3.7                   2.6                  18.6                  85.3         |

for dev set, CER 12.6 -> 10.8, WER 24.8 -> 19.8 for test set, CER 11.9 -> 10.1, WER 23.4 -> 18.6