question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Broken pipe when training a model on CPU

See original GitHub issue

Hi,

I followed the instructions in README.md to train a A2C agent in DoorKey environment using the following command (Python 3.7.3) in Ubuntu 18.04 with 8 CPUs.

python scripts/train.py --algo a2c --env MiniGrid-DoorKey-5x5-v0 --model DoorKey --save-interval 10 --frames 80000

The train went well initially but ended with a BrokenPipeError exception that crashes the training process. The error message is copied below. According to scripts/train.py, the above command will run with 16 processes. Initially, I thought the error was because the training initialized too many processes. But even when setting --procs=6, the same exception happened again. Only when setting --procs=1, the training ran successfully. Is there any special setting I should do to enable the training with multi-processes?

(Just realized that the error roots in torch_ac)

Error Message

Exception ignored in: <function ParallelEnv.__del__ at 0x7f2df3411a60>
Traceback (most recent call last):
  File "~/torch-ac/torch_ac/utils/penv.py", line 41, in __del__
  File "~/anaconda3/lib/python3.7/multiprocessing/connection.py", line 206, in send
  File "~/anaconda3/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
  File "~/anaconda3/lib/python3.7/multiprocessing/connection.py", line 368, in _send
BrokenPipeError: [Errno 32] Broken pipe

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
oceankcommented, Aug 31, 2020

Hi, @lcswillems,

I ran the code in Ubuntu 18.04 and Python 3.7.3 without GPU. I can not tell where in the training the error was triggered yet. I will check it out.

0reactions
lcswillemscommented, Aug 17, 2021

I am closing this issue because I think I fixed the issue. @oceank , if I didn’t, please tell me and I will reopen the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Broken pipe while training Keras model - Stack Overflow
I'm training a few Keras models one after the other on a remote server using userdocker. I connect to the server through ssh...
Read more >
BrokenPipeError: [Errno 32] Broken pipe arises half-way when ...
I am facing the same issue, but I am using CPU to train. Me too. I use one GPU and using generator to...
Read more >
Broken Pipe Error in Python - GeeksforGeeks
A broken Pipe Error is generally an Input/Output Error, which is occurred at the Linux System level. The error has occurred during the...
Read more >
BrokenPipeError using Jupyter Notebook, Lesson 1
I get this error message: BrokenPipeError Traceback (most recent call last) in ----> 1 data.show_batch(rows=3, figsize=(7,6)).
Read more >
[Errno 32] Broken pipe on Windows and CUDA 10
I am able to run my program on system with CUDA 9.2, Quadro M5000 (5), and python 3.7.0 (anaconda) while the same program...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found