Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Streaming inference mode

See original GitHub issue

While I was investigating how streaming works I noticed that the encoder encodes the entire speech before it fed to the decoder. I thought based on the paper (https://arxiv.org/abs/2006.14941) each encoded block is fed to the decoder once they are encoded so that the decoder does not need to wait until the entire segment is encoded. As far as I understand this is not a streaming inference mode, right? If yes, how can one run the model in the streaming inference mode?

My observation is from the following lines in the espnet/espnet2/bin/asr_inference.py/ :

# a. To device 
 batch = to_device(batch, device=self.device)
# b. Forward Encoder
enc, _ = self.asr_model.encode(**batch)
assert len(enc) == 1, len(enc)

# c. Passed the encoder result and the beam search
nbest_hyps = self.beam_search(
    x=enc[0], maxlenratio=self.maxlenratio, minlenratio=self.minlenratio
)
nbest_hyps = nbest_hyps[: self.nbest]

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:6 (3 by maintainers)

Top GitHub Comments

3reactions

eml914commented, May 6, 2021

Yes, you are correct. If you want to run it in fully streaming way, you need to modify spnet/espnet2/bin/asr_inference.py or recreate it as asr_inference_streaming.py, and feed input features chunk by chunk rather than as a whole. For your reference, a friend of mine is now developing the streaming decoder. It is still under development and has not been merged yet. https://github.com/laboroai/espnet/blob/dev/real_streaming_decoder/espnet2/bin/asr_inference_streaming.py

2reactions

eml914commented, Apr 30, 2021

Thank you for your comment. The paper is based on the system that encodes each chunk then decode synchronously. However, to fit the implementation into current espnet2 pipeline, I had to once aggregate all the encoded features and fed into the decoder. In the decoder, or self.beam_search, the encoded features are split again into chunks, so that it reproduce streaming inference. self.beam_search refers espnet/nets/batch_beam_search_online_sim.py here. So, basically it is simulation of the streaming inference. I also used C++ implementation for latency evaluation in the paper, which is fully streaming (i.e. chunk-wise), but unfortunately I cannot share the code for now.

Top Results From Across the Web

Use-cases and benefits of "Streaming" inference #4572 - GitHub

Does ensemble mode (maybe contains both stateless and stateful models as a whole pipeline) support "Streaming" inference.

Streaming Algorithms in Machine Learning

In this notebook, we will use an extremely simple “machine learning” task to learn about streaming algorithms. We will try to find the...

Structured Streaming Programming Guide - Apache Spark

Input Sources; Schema inference and partition of streaming DataFrames/Datasets ... Complete Mode - The entire updated Result Table will be written to the ......

Streaming Inference with Apache Beam and TFX - Databricks

In this session we will be using an LSTM Encoder-Decoder Anomaly Detection model as an example, to show the building and retraining of...

Batch Inference vs Online Inference - ML in Production

The first question you need to answer is whether you should use batch inference or online inference to serve your models.