question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

kedro run --pipeline is not working properly with GPU

See original GitHub issue

Description

I’m trying to run a pipeline that trains a Deep Learning classifier but the child processes that run the training are not able to use the GPU.

Context

The network is implemented in PyTorch and there is a NVIDIA GPU available (both nvidia-smi and torch.cuda.is_available() indicate that). I’m also able to run a simple script that multiplies two matrices in the GPU for many times and can see, at the same time, the GPU usage when I run nvidia-smi. When I start a jupyter lab from Kedro, I can also use the GPU normally.

When the pipeline starts (kedro run --pipeline <pipeline name> -t <arg>), the GPU memory is always filled with 255 MB, independent of me calling any of the PyTorch code (I have checked and commented even the imports that might have PyTorch code). Then, from inside the training method, if I print torch.cuda.is_available(), I get False.

Steps to Reproduce

  1. Create a simple PyTorch module that assigns the device to ‘cuda:0’ if the GPU is available, else it assigns the device to ‘cpu’.
  2. Call that module from a node in a kedro pipeline.
  3. Run nvidia-smi for the pipeline execution using and not using the PyTorch module.

Environment

  • Kedro version used: 0.17.4
  • Python version used: 3.8.5
  • Operating system and version: Ubuntu 18.04.1 x86_64 GNU/Linux

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
datajoelycommented, Mar 24, 2022

So it sounds like your nodes have side effects that are breaking this - I think to get this working we need to do the cuda/pytorch context stuff outside of the nodes and in some sort of singleton, the nodes shouldn’t have any awareness of IO just accept and return data

Keen to work this one out and coach you through this, but I think we might need to approach things differently.

0reactions
merelchtcommented, Oct 7, 2022

I’ll close this issue due to inactivity. Feel free to re-open it if more support is needed!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Run a pipeline — Kedro 0.18.4 documentation - Read the Docs
When processing a node, both SequentialRunner and ParallelRunner perform the following steps in order: Load data based on node input(s). Execute node function...
Read more >
Protobuf compatibility error when running Kedro pipeline
I have already tried changing the protobuf version, but I cannot find a compatible one. What can I do to solve this problem?...
Read more >
Running Kedro… everywhere? Machine Learning Pipelines ...
Specify resource requirements (CPU/Memory/GPU) and run the pipeline. It's one of the easiest ways of scaling up Kedro pipelines if you're using ...
Read more >
Kedro as a Data Pipeline in 10 Minutes | by Kay Jan Wong
Kedro allows reproducible and easy (one-line command!) running of ... in Kedro and Python examples on how to set up, configure, and run...
Read more >
kedro-viz - PyPI
Kedro -Viz helps visualise Kedro data and analytics pipelines. ... Kedro-Viz also allows users to view and compare different runs in the Kedro...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found