Decoding speed
See original GitHub issueI am using a pretrained model from
https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech
with EncoderDecoderASR
to decode the test-clean
and test-other
datasets
from LibriSpeech on a v100 NVIDIA GPU with a batch size of 2.
There are 1310 batches in total in test-clean
and the following shows the timestamps
of the decoding process for test-clean
:
2021-08-12 12:29:41,598 INFO [sp-main.py:56] Decode test-clean started
2021-08-12 12:29:41,602 INFO [sp-main.py:63] Processing 0/1310
2021-08-12 12:29:46,227 INFO [sp-main.py:63] Processing 10/1310
2021-08-12 12:29:52,860 INFO [sp-main.py:63] Processing 20/1310
2021-08-12 12:30:00,492 INFO [sp-main.py:63] Processing 30/1310
2021-08-12 12:30:08,520 INFO [sp-main.py:63] Processing 40/1310
2021-08-12 12:30:16,347 INFO [sp-main.py:63] Processing 50/1310
2021-08-12 12:30:24,998 INFO [sp-main.py:63] Processing 60/1310
2021-08-12 12:30:33,016 INFO [sp-main.py:63] Processing 70/1310
2021-08-12 12:30:41,081 INFO [sp-main.py:63] Processing 80/1310
2021-08-12 12:30:49,583 INFO [sp-main.py:63] Processing 90/1310
2021-08-12 12:30:58,717 INFO [sp-main.py:63] Processing 100/1310
... ...
2021-08-12 12:50:06,324 INFO [sp-main.py:63] Processing 650/1310
2021-08-12 12:50:44,209 INFO [sp-main.py:63] Processing 660/1310
2021-08-12 12:51:21,527 INFO [sp-main.py:63] Processing 670/1310
2021-08-12 12:51:59,841 INFO [sp-main.py:63] Processing 680/1310
2021-08-12 12:52:40,108 INFO [sp-main.py:63] Processing 690/1310
2021-08-12 12:53:21,158 INFO [sp-main.py:63] Processing 700/1310
2021-08-12 12:54:00,136 INFO [sp-main.py:63] Processing 710/1310
2021-08-12 12:54:41,609 INFO [sp-main.py:63] Processing 720/1310
The waves inside the test-clean
dataset are sorted by duration in ascending order.
You can see that the processing time per 10 batches increases from 8 seconds to 40 seconds.
It will take more time for later batches since they contain longer waves.
Could you share the information on the decoding speed of speechbrain on this particular pre-trained model
with the test-clean
dataset?
[EDITED]: The decoding time for later batches is:
2021-08-12 14:25:35,987 INFO [sp-main.py:63] Processing 1240/1310
2021-08-12 14:29:52,408 INFO [sp-main.py:63] Processing 1250/1310
2021-08-12 14:34:54,443 INFO [sp-main.py:63] Processing 1260/1310
2021-08-12 14:40:36,922 INFO [sp-main.py:63] Processing 1270/1310
2021-08-12 14:46:25,953 INFO [sp-main.py:63] Processing 1280/1310
2021-08-12 14:52:52,305 INFO [sp-main.py:63] Processing 1290/1310
2021-08-12 15:01:14,249 INFO [sp-main.py:63] Processing 1300/1310
You can see that it takes several minutes for 10 batches of long waves.
Its WER is
2021-08-12 15:12:34,329 INFO [utils.py:190] [test-clean] %WER 2.52% [1323 / 52576, 176 ins, 121 del, 1026 sub ]
The decoding time for the test-clean
dataset is about 2 hours and 42 minutes (from 12:29:41 to 15:12:34)
From the paper http://www.danielpovey.com/files/2015_icassp_librispeech.pdf, the test-clean
dataset contains 5.4 hours of data, so the RTF is roughly
2 hours 42 minutes / 5.4 hours = 162 / (300 + 0.4 * 60) = 162 / 324 = 0.5
The decoding log for test-other
is
2021-08-12 15:12:34,539 INFO [sp-main.py:56] Decode test-other started
2021-08-12 15:12:34,545 INFO [sp-main.py:63] Processing 0/1470
2021-08-12 15:12:39,227 INFO [sp-main.py:63] Processing 10/1470
2021-08-12 15:12:45,459 INFO [sp-main.py:63] Processing 20/1470
2021-08-12 15:12:51,369 INFO [sp-main.py:63] Processing 30/1470
2021-08-12 15:12:59,177 INFO [sp-main.py:63] Processing 40/1470
2021-08-12 15:13:06,160 INFO [sp-main.py:63] Processing 50/1470
2021-08-12 15:13:13,831 INFO [sp-main.py:63] Processing 60/1470
...
...
2021-08-12 16:51:48,488 INFO [sp-main.py:63] Processing 1380/1470
2021-08-12 16:54:58,199 INFO [sp-main.py:63] Processing 1390/1470
2021-08-12 16:58:27,433 INFO [sp-main.py:63] Processing 1400/1470
2021-08-12 17:01:48,928 INFO [sp-main.py:63] Processing 1410/1470
2021-08-12 17:05:51,573 INFO [sp-main.py:63] Processing 1420/1470
2021-08-12 17:09:57,810 INFO [sp-main.py:63] Processing 1430/1470
2021-08-12 17:14:56,628 INFO [sp-main.py:63] Processing 1440/1470
2021-08-12 17:20:34,213 INFO [sp-main.py:63] Processing 1450/1470
2021-08-12 17:27:10,278 INFO [sp-main.py:63] Processing 1460/1470
Its WER is
2021-08-12 17:37:47,898 INFO [utils.py:190] [test-other] %WER 5.94% [3107 / 52343, 405 ins, 285 del, 2417 sub ]
The test-other
dataset also contains 5.4 hours of data and the decoding time is 2 hours 15 minutes (from 15:12:34 to 17:37:47, so the RTF for the test-other
is
2 hours 15 minutes / 5.4 hours = 135 / (300 + 0.4 * 60) = 135 / 324 = 0.417
[EDITED AGAIN]
Here is the GPU memory usage with a batch size of 2:
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... On | 00000000:3E:00.0 Off | 0 |
| N/A 70C P0 164W / 250W | 27158MiB / 32510MiB | 97% Default |
+-------------------------------+----------------------+----------------------+
It causes OOM if I use a batch size of 10.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13
Top GitHub Comments
The code for reproducing is available at
https://github.com/csukuangfj/k2_decoding_benchmark/blob/master/librispeech/sp-main.py
(With the commit fb12cee562da1f802e4c05ebfb27d4589ff88b64)
PyTorch can help with intra-op parallelization: https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html