Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to calculate frameSize when encoding?

See original GitHub issue

Currently Encoder requires from us to provide the proper frameSize parameter:

  const opus = new prism.opus.Encoder({ rate: 48000, channels: 1, frameSize: 960 });

I assume ‘requires’ in the TypeScript meaning, as there’s no way to omit it.

There are few concerns.

The official Opus docs say:

frame_size is the duration of the frame in samples (per channel).

while prism applies a different meaning to the same thing:

the frame size in bytes to use (e.g. 960 for stereo audio at 48KHz with a frame duration of 20ms)

which I believe brings some confusion.

And what is more important, is that it’s not clear how to calculate it. Obviously, if it can be calculated automatically then it doesn’t need to be required, does it? If not - then please, explain how to calc it.

Issue Analytics

State:
Created 3 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

4reactions

amishshahcommented, Feb 28, 2021

I agree that the meanings are confusing. The Opus documentation actually offers yet another definition for frame_size (see opus_encode):

[in] | frame_size | int: Number of samples per frame of input signal

This is the definition used by the encoder. For example, a value of 960 for this means that for each distinct Opus packet, each packet is formed from 960 bytes of raw audio.

So with that in mind, it’s sort of annoying, but “frame size” can either be a time duration or a size measurement in bytes.

You can swap between the two fairly easily as long as you have the original sample rate.

For example, for a sample rate of 48000Hz and a frame size of 20ms (=0.02 seconds), the frame size for one channel given in bytes is 48000 * 0.02 = 960. Therefore, we would need 960 bytes per channel. For stereo audio, a single Opus packet would therefore require 960 * 2 = 1920 bytes of audio, 960 for each channel. Note that prism automatically multiplies this number by the number of channels, you don’t need to specify 1920 yourself, 960 will do.

While you can derive the frame size in ms from the TOC byte of an Opus packet, you can’t derive the sample rate of the audio (as far as I’m aware). This is given by the OpusHead in the audio container. Technically, this can all be inferred by the decoder if it is given the OpusHead along with the Opus packets, however the OpusHead isn’t always easily accessible to the Decoder stream. This might be something I look into for the next version of prism.

0reactions

amishshahcommented, Mar 1, 2021

No problem 😄 Will close now that the issue is resolved.