question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question - About Prediction time over CPU and GPU

See original GitHub issue

I’m doing some tests for CPU and GPU environment usages for prediction (Predict.py). I’m using an audio file Audio: mp3, 44100 Hz, stereo, fltp, 192 kb/s of duration 00:03:15.29

$ ffprobe /audio/12380187.mp3
ffprobe version 4.0 Copyright (c) 2007-2018 the FFmpeg developers
  built with Apple LLVM version 9.1.0 (clang-902.0.39.1)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libmp3lame --enable-libx264 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma
  libavutil      56. 14.100 / 56. 14.100
  libavcodec     58. 18.100 / 58. 18.100
  libavformat    58. 12.100 / 58. 12.100
  libavdevice    58.  3.100 / 58.  3.100
  libavfilter     7. 16.100 /  7. 16.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  1.100 /  5.  1.100
  libswresample   3.  1.100 /  3.  1.100
  libpostproc    55.  1.100 / 55.  1.100
Input #0, mp3, from '/audio/12380187.mp3':
  Metadata:
    encoder         : Lavf56.40.101
  Duration: 00:03:15.29, start: 0.025057, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 192 kb/s
    Metadata:
      encoder         : Lavc56.60

On a Intel i7 - 12 core CPU the prediction time log says Completed after 0:03:19

$ time python Predict.py with cfg.full_44KHz input_path=/audio/12380187.mp3 output_path=/audio_sep/
Training full singing voice separation model, with difference output and input context (valid convolutions) and stereo input/output, and learned upsampling layer, and 44.1 KHz sampling rate
WARNING - Waveunet Prediction - No observers have been added to this run
INFO - Waveunet Prediction - Running command 'main'
INFO - Waveunet Prediction - Started
Producing source estimates for input mixture file /audio/12380187.mp3
Testing...
2018-11-20 14:54:05.306099: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Num of variables64
INFO:tensorflow:Restoring parameters from checkpoints/full_44KHz/full_44KHz-236118
INFO - tensorflow - Restoring parameters from checkpoints/full_44KHz/full_44KHz-236118
Pre-trained model restored for song prediction
INFO - Waveunet Prediction - Completed after 0:03:19

real	3m26.034s
user	13m30.420s
sys	4m40.200s

while on Intel Xeon 12 core plus 2x Nvidia GeForce GTX 1080 says Completed after 0:00:16

$ time python Predict.py with cfg.full_44KHz input_path=/audio/12380187.mp3
/usr/local/lib/python2.7/dist-packages/scikits/audiolab/soundio/play.py:48: UserWarning: Could not import alsa backend; most probably, you did not have alsa headers when building audiolab
  warnings.warn("Could not import alsa backend; most probably, "
Training full singing voice separation model, with difference output and input context (valid convolutions) and stereo input/output, and learned upsampling layer, and 44.1 KHz sampling rate
WARNING - Waveunet Prediction - No observers have been added to this run
INFO - Waveunet Prediction - Running command 'main'
INFO - Waveunet Prediction - Started
Producing source estimates for input mixture file /audio/12380187.mp3
Testing...
2018-11-21 12:34:13.829481: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-21 12:34:13.830157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 7.46GiB
2018-11-21 12:34:13.961794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-21 12:34:13.962562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:02:00.0
totalMemory: 7.93GiB freeMemory: 7.81GiB
2018-11-21 12:34:13.963292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1
2018-11-21 12:34:14.531254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-21 12:34:14.531305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 1 
2018-11-21 12:34:14.531329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N Y 
2018-11-21 12:34:14.531336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1:   Y N 
2018-11-21 12:34:14.531830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7209 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-11-21 12:34:14.589915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7543 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1)
Num of variables64
INFO:tensorflow:Restoring parameters from checkpoints/full_44KHz/full_44KHz-236118
INFO - tensorflow - Restoring parameters from checkpoints/full_44KHz/full_44KHz-236118
Pre-trained model restored for song prediction
INFO - Waveunet Prediction - Completed after 0:00:16

real	0m18.340s
user	0m15.972s
sys	0m5.528s

I’m not sure from logging if tensorflow is using both gpu devices or gpu 0 only. If I’m not wrong, most of the work is done in the Models.py here https://github.com/f90/Wave-U-Net/blob/master/Models/UnetSpectrogramSeparator.py#L39 when the computation graph is calculated. I assume that these operations go on gpu:0 in this configuration, so gpu:1 will not be used - but I’m not sure of it.

Thank you very much!

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:19 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
f90commented, Nov 21, 2018

@f90 ok just realized that this is hardcoded here

# Batch size of 1
    sep_input_shape[0] = 1
    sep_output_shape[0] = 1

    mix_context, sources = Input.get_multitrack_placeholders(sep_output_shape, model_config["num_sources"], sep_input_shape, "input")

so to change batch_size here I should change the sep_input_shape.

Be aware that changing the code there means that the internal code further down has to be adapted as well though, since it assumes that we insert one audio segment and get predictions for one back, not multiple. I am talking about the predict_track method in particular, where I am basically iterating over the input audio, taking a chunk, predicting the sources, and append the output chunk to the overall output audio. If we do that in batches, we need to collect a bunch of these segments in a loop to fill a batch, predict a batch, then append the outputs in the correct order, and then continue iterating over the input audio, until we have nothing left. However we might end up with only a few segments at the end that do not fill up a complete batch, in that case we would have to pad the batch with zeros…

If i have some time for this, and there is sufficient need indicated from all of you (leave a like/comment here to show that) then I will come around to implement that. I would keep the standard setting to a batch size of 1 though to make sure prediction still works even on small systems.

2reactions
f90commented, Dec 13, 2018

I am curious why you expect any improvements with the batched version. But if you want to experiment with it, replace the predict_track function with this version of it in the code. If it turns out better, just tell me and I can push it to the repository for everyone.

Also you have to comment out

    # Batch size of 1
    sep_input_shape[0] = 1
    sep_output_shape[0] = 1

found in predict function.

def predict_track(model_config, sess, mix_audio, mix_sr, sep_input_shape, sep_output_shape, separator_sources, mix_context):
    '''
    Outputs source estimates for a given input mixture signal mix_audio [n_frames, n_channels] and a given Tensorflow session and placeholders belonging to the prediction network.
    It iterates through the track, collecting segment-wise predictions to form the output.
    :param model_config: Model configuration dictionary
    :param sess: Tensorflow session used to run the network inference
    :param mix_audio: [n_frames, n_channels] audio signal (numpy array). Can have higher sampling rate or channels than the model supports, will be downsampled correspondingly.
    :param mix_sr: Sampling rate of mix_audio
    :param sep_input_shape: Input shape of separator ([batch_size, num_samples, num_channels])
    :param sep_output_shape: Input shape of separator ([batch_size, num_samples, num_channels])
    :param separator_sources: List of Tensorflow tensors that represent the output of the separator network
    :param mix_context: Input tensor of the network
    :return: 
    '''
    # Load mixture, convert to mono and downsample then
    assert(len(mix_audio.shape) == 2)
    if model_config["mono_downmix"]:
        mix_audio = np.mean(mix_audio, axis=1, keepdims=True)
    else:
        if mix_audio.shape[1] == 1:# Duplicate channels if input is mono but model is stereo
            mix_audio = np.tile(mix_audio, [1, 2])
    mix_audio = Utils.resample(mix_audio, mix_sr, model_config["expected_sr"])

    # Preallocate source predictions (same shape as input mixture)
    source_time_frames = mix_audio.shape[0]
    source_preds = [np.zeros(mix_audio.shape, np.float32) for _ in range(model_config["num_sources"])]

    input_time_frames = sep_input_shape[1]
    output_time_frames = sep_output_shape[1]

    # Pad mixture across time at beginning and end so that neural network can make prediction at the beginning and end of signal
    pad_time_frames = (input_time_frames - output_time_frames) / 2
    mix_audio_padded = np.pad(mix_audio, [(pad_time_frames, pad_time_frames), (0,0)], mode="constant", constant_values=0.0)

    # Iterate over mixture magnitudes, fetch network predictions
    mixes = list()
    start_end_times = list()
    for source_pos in range(0, source_time_frames, output_time_frames):
        # If this output patch would reach over the end of the source spectrogram, set it so we predict the very end of the output, then stop
        if source_pos + output_time_frames > source_time_frames:
            source_pos = source_time_frames - output_time_frames

        # Prepare mixture excerpt by selecting time interval
        mix_part = mix_audio_padded[source_pos:source_pos + input_time_frames,:]
        mixes.append(mix_part)
        start_end_times.append((source_pos, source_pos + output_time_frames))

    # Make predictions
    for mix_num in range(0, len(mixes), model_config["batch_size"]):
        if mix_num + model_config["batch_size"] < len(mixes):
            batch = np.stack(mixes[mix_num:mix_num + model_config["batch_size"]])
        else:
            batch = np.stack(mixes[mix_num:] + [np.zeros(mixes[0].shape) for _ in range(mix_num + model_config["batch_size"] - len(mixes))])

        source_parts = sess.run(separator_sources, feed_dict={mix_context: batch})

        # Save predictions
        # source_shape = [1, freq_bins, acc_mag_part.shape[2], num_chan]
        for out_num in range(mix_num, min(mix_num + model_config["batch_size"], len(mixes))):
            batch_num = out_num - mix_num
            for i in range(model_config["num_sources"]):
                source_preds[i][start_end_times[out_num][0] : start_end_times[out_num][1]] = source_parts[i][batch_num, :, :]
                
    return source_preds
Read more comments on GitHub >

github_iconTop Results From Across the Web

Prediction with GPU is much slower than with CPU?
There are hundreds of questions asking why this code runs slow in the GPU but fast in the CPU, and the answer is...
Read more >
CNN computing time on good CPU vs cheap GPU
I am currently working with Keras+Tensorflow on a desktop PC with i7, 3.6 ghz, 32 Gb RAM. Question: how good a GPU would...
Read more >
Time Series Forecasting with the NVIDIA ... - NVIDIA Developer
The NVIDIA Time Series Prediction Platform provides end-to-end GPU acceleration from training to inference for time series models. The reference ...
Read more >
Performance and Power Prediction for Concurrent Execution ...
plications on GPUs based on the CPU and GPU performance counters, ... ory bandwidth of an application on a GPU to predict the...
Read more >
Why Do I Get Different Results Each Time in Machine Learning?
Perhaps your model is making different predictions each time it is ... For more on thinking about supervised learning as a search problem, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found