question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to kill distributed processes

See original GitHub issue

Hi, I am running on one 8-GPU machine in nvidia docker with pytorch 1.0, cuda 10.

I follow the script as here to run the program.

However, the distributed processes do not terminate after I Ctrl+C. Some of the process still running on background and killing does not terminate it either. Please help.

Hence, how to properly terminate a ongoing process with distributed running?

Thank you,

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

57reactions
frankangcommented, Feb 18, 2019

I use this script to kill zombie processes. kill $(ps aux | grep "train.py" | grep -v grep | awk '{print $2}')

6reactions
XiwuChencommented, May 7, 2021

I use this script to kill zombie processes. kill $(ps aux | grep "train.py" | grep -v grep | awk '{print $2}')

Run the following command if you use python train.py, i.e. spawn processes from the main function: kill $(ps aux | grep multiprocessing.spawn | grep -v grep | awk '{print $2}')

Hello. Is there any better way to kill these children processes in the training code? We do NOT want to kill these processes manually. Also, if there are two training tasks for one user, we have to figure out one task before killing it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to kill distributed processes · Issue #487 - GitHub
Let me know whether it works. It works for me. One click ctrl-c trigger destroy, and second click ctrl-c if you don't want...
Read more >
Kill PyTorch Distributed Training Processes - Lei Mao
After hitting Ctrl + C , one process is killed and we still have 7 processes left. In order to release these resources...
Read more >
How to shut down all processes with 'Ctrl + C' when using ...
@ pritamdamania87 Yes, I use python -m torch.distributed.launch to run my code. And with Ctrl+C to shut down the training, some processes are ......
Read more >
KILL (Transact-SQL) - SQL Server - Microsoft Learn
The KILL command can be used to resolve in-doubt distributed transactions. These transactions are unresolved distributed transactions that occur ...
Read more >
How can I kill a process in Linux when kill -9 fails?
I tried kill 1234 ; I tried kill -9 1234 I tried kill -KILL 1234 - no effect. Tried Ctrl+C and Ctrl+Z in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found