Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

kedro run --pipeline is not working properly with GPU

See original GitHub issue

Description

I’m trying to run a pipeline that trains a Deep Learning classifier but the child processes that run the training are not able to use the GPU.

Context

The network is implemented in PyTorch and there is a NVIDIA GPU available (both nvidia-smi and torch.cuda.is_available() indicate that). I’m also able to run a simple script that multiplies two matrices in the GPU for many times and can see, at the same time, the GPU usage when I run nvidia-smi. When I start a jupyter lab from Kedro, I can also use the GPU normally.

When the pipeline starts (kedro run --pipeline <pipeline name> -t <arg>), the GPU memory is always filled with 255 MB, independent of me calling any of the PyTorch code (I have checked and commented even the imports that might have PyTorch code). Then, from inside the training method, if I print torch.cuda.is_available(), I get False.

Steps to Reproduce

Create a simple PyTorch module that assigns the device to ‘cuda:0’ if the GPU is available, else it assigns the device to ‘cpu’.
Call that module from a node in a kedro pipeline.
Run nvidia-smi for the pipeline execution using and not using the PyTorch module.

Environment

Kedro version used: 0.17.4
Python version used: 3.8.5
Operating system and version: Ubuntu 18.04.1 x86_64 GNU/Linux

Issue Analytics

State:
Created a year ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

datajoelycommented, Mar 24, 2022

So it sounds like your nodes have side effects that are breaking this - I think to get this working we need to do the cuda/pytorch context stuff outside of the nodes and in some sort of singleton, the nodes shouldn’t have any awareness of IO just accept and return data

Keen to work this one out and coach you through this, but I think we might need to approach things differently.

0reactions

merelchtcommented, Oct 7, 2022

I’ll close this issue due to inactivity. Feel free to re-open it if more support is needed!

Top Results From Across the Web

Run a pipeline — Kedro 0.18.4 documentation - Read the Docs

When processing a node, both SequentialRunner and ParallelRunner perform the following steps in order: Load data based on node input(s). Execute node function...

Protobuf compatibility error when running Kedro pipeline

I have already tried changing the protobuf version, but I cannot find a compatible one. What can I do to solve this problem?...

Running Kedro… everywhere? Machine Learning Pipelines ...

Specify resource requirements (CPU/Memory/GPU) and run the pipeline. It's one of the easiest ways of scaling up Kedro pipelines if you're using ...

Kedro as a Data Pipeline in 10 Minutes | by Kay Jan Wong

Kedro allows reproducible and easy (one-line command!) running of ... in Kedro and Python examples on how to set up, configure, and run...

kedro-viz - PyPI

Kedro -Viz helps visualise Kedro data and analytics pipelines. ... Kedro-Viz also allows users to view and compare different runs in the Kedro...