Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Try messing with buffer size?

See original GitHub issue

https://github.com/lhotse-speech/lhotse/blob/b05e344bbaf97db316b4c81d7c6d02b068198a59/lhotse/audio.py#L1703

@csukuangfj this is related to the slowness here https://github.com/k2-fsa/icefall/pull/312, of reading ffmpeg data. I am trying to debug it by doing strace on your ffmpeg commands, like this (as you):

for p in $(ps -u kuangfangjun | grep ffmpeg | tail -n 1 | awk '{print $1}'); do strace -T -p $p; done >& foo

To see which system calls are slow, we can do:

awk '{print $NF, $0}' < foo | sort -r | less
# output is:
<0.014715> write(1, "\355\223H<X2^<\303\n\330<.VH=\353Z\231=\203H\305=\216j\341=\366f\347="..., 1280) = 1280 <0.014715>
<0.007298> write(1, "\332\r)=p9,=>\337\t=\200\335\336<L\331\352<&\222\"=`hh=(\360\233="..., 1280) = 1280 <0.007298>
<0.000146> fstat(3, {st_mode=S_IFREG|0664, st_size=9641836, ...}) = 0 <0.000146>
<0.000099> write(2, "ffmpeg version 3.4.8-0ubuntu0.2", 31) = 31 <0.000099>
<0.000086> openat(AT_FDCWD, "/ceph-fj/fangjun/open-source-2/icefall-pruned-multi-datasets/egs/librispeech/ASR/download/GigaSpeech/audio/podcast/P0036/POD0000003582.opus", O_RDONLY) = 3 <0.000086>
<0.000055> write(1, "lW\20\272\202\372\33\272\236\t\4\272\372H\221\271\36 \313\271\200r\374\271\35#\17\272n\30\371\271"..., 1280) = 1280 <0.000055>
<0.000055> getdents(3, /* 72 entries */, 32768)    = 2208 <0.000055>
<0.000052> write(2, " Copyright (c) 2000-2020 the FFm"..., 46) = 46 <0.000052>
<0.000047> write(1, "*\3314>\313\4\21>\253W\221=\270\350\205\273\244\372\256\275\334 \31\2760\27I\276^\26m\276"..., 1280) = 1280 <0.000047>
<0.000046> write(1, "0\322i:X\t\2639\246\320\220\271/uz\272\n\244\325\272\226\360\344\272p~\264\272\222G\374\271"..., 1280) = 1280 <0.000046>
<0.000045> write(1, "t\203\354=;\250\320= z\334=\337\354\262=V\304\216=\336\222\233=Ia}=q\354o="..., 1280) = 1280 <0.000045>
<0.000044> write(1, "\362\1F:\252\225\3669NL\227:Zz\3;\300\314\332:\270e\3629{dK\271\276p5\272"..., 1280) = 1280 <0.000044>
<0.000043> write(1, "n\371\23\274\232\v\n<^\30\316<R\242\3\274\36\341\344\274`\304\254<,X\263<\324D\300\274"..., 1280) = 1280 <0.000043>
<0.000043> write(1, "\355\336\323\274\27\333s=\370\302\343\273\316p\240\275\350\204\205=\205\257\305\274\326\277C\273\266G\331\274"..., 1280) = 1280 <0.000043>
<0.000042> write(2, "  configuration: --prefix=/usr -"..., 1098) = 1098 <0.000042>
<0.000040> write(1, "\230^<\276\236cV\276\264ys\276\314<\213\276\326I\232\276bM\250\276\7\212\266\276\17\211\302\276"..., 1280) = 1280 <0.000040>
<0.000040> write(1, "\202n\244;\34\370\217\274vF\31\275\246\32\30\275\323\31f\274\234\231\212<\242{\345<\342\270\303<"..., 1280) = 1280 <0.000040>
<0.000039> write(1, "|W\0<\"f6\273\273Tb\274b\2550\274\4\371\364\272\32K\265<\214$\3=\neD="..., 1280) = 1280 <0.000039>
<0.000039> write(1, "z\266\357=O\365\361=\36O\351=2\241\312=+\347\204=.\221\203<\262\3554\275\322\313\266\275"..., 1280) = 1280 <0.000039>
<0.000039> write(1, "\230f\350\275\303\216\17\276\10\312&\276\232\350\26\276R\243\375\275\250\264\5\276\276\2\364\275b\350\10\276"..., 1280) = 1280 <0.000039>
<0.000038> write(2, "  libavcodec     57.107.100 / 57"..., 41) = 41 <0.000038>

… anyway it seems to be the case that writing takes longer than reading, i.e. it’s spending longer waiting to output data than to read data. “slow writes” happen generally once or twice per ffmpeg program, and I expect it corresponds to when a buffer gets full AND the python program it’s writing to happens to be busy doing something that it’s hard to wake up from. Now, it looks like the bufsize arg to subprocess.run (which is one of the generic kwargs, not specifically listed in the docs) defaults to -1 which means io.DEFAULT_BUFFER_SIZE which seems to be 8192. However I don’t see any obvious periodicity in how long the syscall write takes that would correspond to that buffer size. This particular ffmpeg call seems to output about 500k bytes. One way to make this a little faster might be to just add bufsize=2000000 to the subprocess.run() call in read_opus_ffmpeg(). That would buffer all the output so it never has to wait on the python program that’s calling it.

Issue Analytics

State:
Created a year ago
Reactions:3
Comments:38 (12 by maintainers)

Top GitHub Comments

1reaction

csukuangfjcommented, Apr 17, 2022

Here is the training log using pre-computed features on a machine with 20 CPUs and 2 dataloader workers.

2022-04-17 16:11:34,444 INFO [train.py:1069] (0/8) Sanity check -- see if any of the batches in epoch 0 would cause OOM.
2022-04-17 16:12:51,051 INFO [train.py:794] (0/8) Epoch 0, batch 50, libri_loss[loss=0.5503, simple_loss=1.101, pruned_loss=6.998, over 7279.00 frames.], tot_loss[loss=1.093, simple_loss=2.185, pruned_loss=7.294, over 321403.98 frames.], libri_tot_loss[loss=0.6873, simple_loss=1.375, pruned_loss=6.659, over 168925.85 frames.], giga_tot_loss[loss=1.523, simple_loss=3.046, pruned_loss=7.905, over 172574.28 frames.], batch size: 18lr: 3.00e-03
2022-04-17 16:13:27,281 INFO [train.py:794] (0/8) Epoch 0, batch 100, giga_loss[loss=0.5442, simple_loss=1.088, pruned_loss=7.734, over 7368.00 frames.], tot_loss[loss=0.7829, simple_loss=1.566, pruned_loss=7.314, over 573458.97 frames.], libri_tot_loss[loss=0.5878, simple_loss=1.176, pruned_loss=6.812, over 318665.67 frames.], giga_tot_loss[loss=1.023, simple_loss=2.047, pruned_loss=7.798, over 326122.50 frames.], batch size: 44lr: 3.00e-03
2022-04-17 16:14:07,214 INFO [train.py:794] (0/8) Epoch 0, batch 150, libri_loss[loss=0.4393, simple_loss=0.8785, pruned_loss=7.002, over 7364.00 frames.], tot_loss[loss=0.6557, simple_loss=1.311, pruned_loss=7.312, over 766852.85 frames.], libri_tot_loss[loss=0.5374, simple_loss=1.075, pruned_loss=6.863, over 442659.11 frames.], giga_tot_loss[loss=0.8235, simple_loss=1.647, pruned_loss=7.733, over 466818.36 frames.], batch size: 19lr: 3.00e-03
2022-04-17 16:14:49,728 INFO [train.py:794] (0/8) Epoch 0, batch 200, giga_loss[loss=0.4717, simple_loss=0.9434, pruned_loss=7.073, over 7393.00 frames.], tot_loss[loss=0.58, simple_loss=1.16, pruned_loss=7.249, over 917740.82 frames.], libri_tot_loss[loss=0.4997, simple_loss=0.9994, pruned_loss=6.922, over 575446.90 frames.], giga_tot_loss[loss=0.7285, simple_loss=1.457, pruned_loss=7.608, over 567927.20 frames.], batch size: 146lr: 3.00e-03
2022-04-17 16:16:29,609 INFO [train.py:794] (0/8) Epoch 0, batch 250, libri_loss[loss=0.4674, simple_loss=0.9347, pruned_loss=7.239, over 7227.00 frames.], tot_loss[loss=0.5359, simple_loss=1.072, pruned_loss=7.217, over 1038589.06 frames.], libri_tot_loss[loss=0.4823, simple_loss=0.9647, pruned_loss=6.972, over 669858.64 frames.], giga_tot_loss[loss=0.6521, simple_loss=1.304, pruned_loss=7.486, over 683353.00 frames.], batch size: 21lr: 3.00e-03
2022-04-17 16:17:10,028 INFO [train.py:794] (0/8) Epoch 0, batch 300, giga_loss[loss=0.5612, simple_loss=1.122, pruned_loss=7.264, over 7471.00 frames.], tot_loss[loss=0.5087, simple_loss=1.017, pruned_loss=7.184, over 1129025.56 frames.], libri_tot_loss[loss=0.4679, simple_loss=0.9358, pruned_loss=6.988, over 757952.98 frames.], giga_tot_loss[loss=0.6097, simple_loss=1.219, pruned_loss=7.412, over 776759.88 frames.], batch size: 69lr: 3.00e-03
2022-04-17 16:17:45,481 INFO [train.py:794] (0/8) Epoch 0, batch 350, libri_loss[loss=0.3962, simple_loss=0.7924, pruned_loss=7.02, over 7434.00 frames.], tot_loss[loss=0.4873, simple_loss=0.9746, pruned_loss=7.146, over 1204571.91 frames.], libri_tot_loss[loss=0.4553, simple_loss=0.9107, pruned_loss=6.988, over 827137.93 frames.], giga_tot_loss[loss=0.5739, simple_loss=1.148, pruned_loss=7.336, over 872441.52 frames.], batch size: 20lr: 3.00e-03

Note it takes about only 5 minutes to process 350 batches, which is very close to the training time of the reworked model from Dan.

1reaction

danpoveycommented, Apr 16, 2022

Great! And such a simple fix!