Extract PCM audio stream for waveform
See original GitHub issueHeyya, I’m wanting to create a waveform based on a video selected by a user. We currently have this in iOS with the built in classes and it runs in about 0.33sec, however in our implementation on Android (non-LiTr) it is taking upwards of 11.41sec (it was 18.91sec before i added some threading into it 👀).
There appears to be a lack of info around how to do this efficiently on Android. I came across LiTr and was curious if its internal code is much better than mine and could be of more use.
Some background into my current Android code. Current code is in Xamarin.Android. This means I am using C# for my code so some class/method names may appear slightly different. For this usecase though the performance issues I am seeing should not be related to not using vanilla Java+Android. In my current code I am using a MediaExtractor
+ MediaCodec
Decoder, selecting only the audio track and then in the OnInputBufferAvailable
queuing that input buffer on the MediaCodec
which then fires the OnOutputBufferAvailable
. From here I can get the output buffer and loop over it to extract the shorts for me to parse and create a waveform.
At this point I am not even 100% that what I have is the raw audio data or if its just AAC data (eitherway it generates a the same waveform, so… yay?). The iOS code works very differently. You tell it what output format you want, LinearPCM, 16bit, not as a float, one channel. And then it fires and I parse the result data in the exact same way.
My thought here is if I can use LiTr almost as the example in the readme is.
mediaTransformer.transform(requestId,
sourceVideoUri,
targetVideoFilePath,
targetVideoFormat,
targetAudioFormat,
videoTransformationListener,
transformationOptions);
Where targetVideoFormat
is null because I just want to pass it through and targetAudioFormat
is mono 16bit linear PCM. I didn’t get very far with that because it’s essentially transcoding the file. I’d still need to open it and parse the audio a then do my waveform. Neat if it’s still faster than 11.41sec.
Looking around for PCM I only found this comment that mentions decode audio frame into PCM. It’s for a different purpose but same result. So I think what I need to do is to use the TrackTransform
class.
If I were to go this route would I extract the data with a Renderer
, or would I some how capture this data after the Encoder
has encoded it as PCM linear 16bit audio? Any heap or guidance with extracting PCM data would be greatly appreciated.
As a bonus I actually have setup a binding library over here. After I’ve tidied it up and added some samples I’ll release it as a nuget package so any other Xamarin.Android developer can use LiTr just like they would if they included it with gradle 👍
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:8 (6 by maintainers)
Top GitHub Comments
@izzytwosheds an update on all of this.
The repo with full Xamarin.Android binding is up and available here, and the nuget package for any Xamarin.Android developer to use is available here. The nuget package for filters is its own seperate thing and it is here.
When an update comes through to this repo all I need to do is find and replace 1.4.18 with 1.4.19 in my repo, re-run the build nuget packages script and it will fetch updated aars and build both nugets for me to go upload. At some point I’d like to go through the entire demo app and convert it to Xamarin.Android and then try keep it as updated as possible as its a very useful library.
In terms of how this helped us out. Our app generates
videoLength / 4
thumbnails and then choses closest to display. We could improve this by only generating what we need. The setup ofFrameExtractParameters
andVideoFrameExtractor
actually lets us do that a lot more easier now.For my demo video of 92 seconds (23 small thumbnails) our generation time went from 5.04sec down to 4.58sec. ~9% improvement. Not huge, but I’ll take it. This could be improved A LOT as in this current view we only actually display 3 images 😂
In terms of waveform generation we had HUGE improvements. We got it from 17.54sec down to 5.67sec. ~67% improvement. There may be more improvements to have here such as making the audio mono before we count samples, reducing bitrate/sample rate as that level of accuracy isn’t needed for our waveforms. I am sure there are other parts that can be optimised but that is what I have off the top of my head.
Thanks for the effort on your end to help this all happen 👍
Update: writing PCM audio into WAV file is coming soon.