Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Mel-Spectrogram Image Generation Issues

See original GitHub issue

Hi, I’m trying to troubleshoot an error with my mel-spectrogram image generation. The saved .png image looks like the following:

Here is the code I am using:

var sg = new SpectrogramGenerator(sampleRate, FftSize, StepSize, 0, MaxFreq);
sg.Add(audio);

var bitmap = sg.GetBitmapMel(MelBinCount, Intensity, SaveAsDb);
bitmap.Save(file + "MelSpec.png", ImageFormat.Png);

where: FFTSize = 2048 StepSize = 300 MaxFreq = 3000 Intensity = 5 MelBinCount = 200 SaveAsDb = false

Also, I am using the ReadWAV method as it is shown in the readme to read .wav audio inputs.

Any help would be appreciated.

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

jwoodtkecommented, Aug 5, 2021

You were totally right @swharden! I think another problem was the buffer array was in bytes rather than floats.

However, now the mel-spectrograms seem to be missing the bass component. Here is a sample image:

pop 00013MelSpec

Also, here is the new code for audio reads:

        private async Task<(double[] audio, int sampleRate, double length)> ReadWav(string file, double multiplier = 16000)
        {
            await using (var afr = new AudioFileReader(file))
            {
                int sampleRate = afr.WaveFormat.SampleRate;
                int sampleCount = (int) (afr.Length / afr.WaveFormat.BitsPerSample / 8);
                int channelCount = afr.WaveFormat.Channels;
                var audio = new List<double>(sampleCount);
                var buffer = new float[sampleRate * channelCount];
                var length = afr.TotalTime;
                int samplesRead = 0;
                while ((samplesRead = await PopulateBufferArray(afr, buffer)) > 0)
                {
                    await AddAudioRange(audio, buffer.Take(samplesRead).Select(x => x * multiplier));
                }
                return (audio.ToArray(), sampleRate, length.TotalSeconds);
            }
        }

        private async Task<int> PopulateBufferArray(AudioFileReader afr, float[] buffer)
        {
            return afr.Read(buffer, 0, buffer.Length);
        }
        private async Task AddAudioRange(List<double> audio, IEnumerable<double> samplesRead)
        {
            audio.AddRange(samplesRead);
        }

0reactions

jwoodtkecommented, Aug 5, 2021

Nevermind, I think the mel bin count I was using was too high. It’s all iterative adjustment form here on out, thanks again for the help!