Streaming inference mode
See original GitHub issueWhile I was investigating how streaming works I noticed that the encoder encodes the entire speech before it fed to the decoder. I thought based on the paper (https://arxiv.org/abs/2006.14941) each encoded block is fed to the decoder once they are encoded so that the decoder does not need to wait until the entire segment is encoded. As far as I understand this is not a streaming inference mode, right? If yes, how can one run the model in the streaming inference mode?
My observation is from the following lines in the espnet/espnet2/bin/asr_inference.py/
:
# a. To device
batch = to_device(batch, device=self.device)
# b. Forward Encoder
enc, _ = self.asr_model.encode(**batch)
assert len(enc) == 1, len(enc)
# c. Passed the encoder result and the beam search
nbest_hyps = self.beam_search(
x=enc[0], maxlenratio=self.maxlenratio, minlenratio=self.minlenratio
)
nbest_hyps = nbest_hyps[: self.nbest]
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Use-cases and benefits of "Streaming" inference #4572 - GitHub
Does ensemble mode (maybe contains both stateless and stateful models as a whole pipeline) support "Streaming" inference.
Read more >Streaming Algorithms in Machine Learning
In this notebook, we will use an extremely simple “machine learning” task to learn about streaming algorithms. We will try to find the...
Read more >Structured Streaming Programming Guide - Apache Spark
Input Sources; Schema inference and partition of streaming DataFrames/Datasets ... Complete Mode - The entire updated Result Table will be written to the ......
Read more >Streaming Inference with Apache Beam and TFX - Databricks
In this session we will be using an LSTM Encoder-Decoder Anomaly Detection model as an example, to show the building and retraining of...
Read more >Batch Inference vs Online Inference - ML in Production
The first question you need to answer is whether you should use batch inference or online inference to serve your models.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, you are correct. If you want to run it in fully streaming way, you need to modify
spnet/espnet2/bin/asr_inference.py
or recreate it asasr_inference_streaming.py
, and feed input features chunk by chunk rather than as a whole. For your reference, a friend of mine is now developing the streaming decoder. It is still under development and has not been merged yet. https://github.com/laboroai/espnet/blob/dev/real_streaming_decoder/espnet2/bin/asr_inference_streaming.pyThank you for your comment. The paper is based on the system that encodes each chunk then decode synchronously. However, to fit the implementation into current espnet2 pipeline, I had to once aggregate all the encoded features and fed into the decoder. In the decoder, or
self.beam_search
, the encoded features are split again into chunks, so that it reproduce streaming inference.self.beam_search
refersespnet/nets/batch_beam_search_online_sim.py
here. So, basically it is simulation of the streaming inference. I also used C++ implementation for latency evaluation in the paper, which is fully streaming (i.e. chunk-wise), but unfortunately I cannot share the code for now.