Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] NVTabular data loader for TensorFlow validation is slow

See original GitHub issue

Describe the bug Using NVTabular data loader for TensorFlow for validation with criteo dataset is slow:

    validation_callback = KerasSequenceValidater(valid_dataset_tf)
    history = model.fit(train_dataset_tf, 
                        epochs=EPOCHS, 
                        steps_per_epoch=20, callbacks=[validation_callback])

Training : 2min for 2288 steps Validation: Estimated 55min for 3003 steps Same batch-size, dataset, etc. … The validation dataset is 1.3x bigger, but iterating through the validation dataset takes 27x more time than for training.

Steps/Code to reproduce bug An example is provided here: https://github.com/bschifferer/NVTabular/blob/criteo_tf_slow/examples/criteo_tensorflow_slow.ipynb

You notice that iterating over the training dataset takes in average ~1sec per 10 batches but it takes 4-6s per 10 batches in the validation dataset.

Expected behavior Validation loop should be similar fast than training loop

Additional context There are multiple hypothesis and tests:

This behavior can be observed by just iterating over the dataset and execute the forward-pass of the model
If we switch training/validation dataloader, then the validation dataloader is fast and the training dataloader is slow. Meaning, that a iteration over the 2nd data loader. Hypothesis is that the GPU memory is not released from the 1st data loader and blocks the pipeline
If we remove the forward-pass of the model in the loop, then both iterations are fast. It has probably something to do with moving the data to the TensorFlow model
I tried out using tf.keras.backend.clear_session() between the iterations, but did not help
I tried out between the iterations

from numba import cuda
cuda.select_device(0)
cuda.close()

but resulted in an error

I tried out to use separate subprocesses, but did not improve performance

Issue Analytics

State:
Created 3 years ago
Comments:15 (15 by maintainers)

Top GitHub Comments

1reaction

EvenOldridgecommented, Dec 2, 2020

@jperez999 we think this is happening in the TF dataloader only. @bschifferer will confirm by testing PyTorch. Can you take a look please.

0reactions

benfredcommented, Jan 19, 2021

This seems to be related to insufficient host / gpu memory - closing

Top Results From Across the Web

Source code for nvtabular.loader.tensorflow

Sequence, DataLoader): """ Infinite generator used to asynchronously iterate through CSV or Parquet dataframes on GPU by leveraging an NVTabular `Dataset`.

Announcing the NVIDIA NVTabular Open Beta with Multi-GPU ...

The NVTabular data loader for TensorFlow is designed to feed tabular data ... such as the training and validation data, data schema, ...

slow training despite using tf data pipeline - Stack Overflow

I have created data pipeline with the help of tf.data API of tensorflow. My issue is that training is too slow despite using...

Distributed Data Science using NVTabular on Spark & Dask

GPU doesn't get fully utilized as the data loader is comparatively much slower in preparing the next batch. With NVTabular highly customized tabular...

NVTabular-PyTorch-DeepFM - Kaggle

Explore and run machine learning code with Kaggle Notebooks | Using data from MovieLens 25M Dataset.