question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Strange issue with subprocess from within loky

See original GitHub issue

(Apologies if this has been answered before, but searching of existing issues didn’t yield any obvious references)

I’m using loky to run parallel analysis of a video clip. So far, the library has been absolutely amazing for working with OpenCV. Recently, I added an additional job to perform some specialized audio analysis of the clip in parallel alongside the video analysis. Unfortunately, and bizarrely, the audio analysis portion seems to fail when run under loky for certain files, but operates normally when run directly in the initial process.

The audio analysis invokes ffmpeg via a subprocess.Popen call and ingests the audio data found via a PIPE. When run under loky (and only on a subset of video files), the pipe formed from stdout from ffmpeg appears to return no data. At first glance, I thought this might be an issue with ffmpeg, but the stderr output appears to be the same, and the empty stdout pipe leads me to believe something is being mis-wired when loky starts the new process.

I’ve distilled the issue down in the attached file (from the audio lib I’m using): When the function testreading is passed the path of a sufficiently large video file, the following is printed when run under loky:

<_io.BufferedReader name=0>
<_io.BufferedReader name=5>
<_io.BufferedReader name=5>: 1024
<_io.BufferedReader name=5>: 1024
<_io.BufferedReader name=5>: 1024
<_io.BufferedReader name=0>: 0
<_io.BufferedReader name=5>: 82
<_io.BufferedReader name=5>: 0

As you can see, the buffered reader for stdout (fd 0) immediately returns with 0 bytes found, while stderr works as expected.

When run not under loky, both buffers return the data expected.

Code to reproduce the issue (given .txt due to GitHub limitations): lokyissue.txt

OS: Ubuntu 18.04 Loky version: 2.6.0 Python version: 3.6.9

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
josephschorrcommented, Aug 10, 2020

If anyone was interested, it turns out that when loky spawns a new process, the stdin is not assigned anything (or assigned None). As a result, when launching ffmpeg via subprocess, ffmpeg is fed garbage data on stdin. Setting the stdin to PIPE appears to have solved the problem.

0reactions
jubick1337commented, Aug 1, 2022

Actually this causes a lot of struggle because it’s not clear where problem is coming from. I was parsing audio data using joblib and some audios (just a fraction) were 0 length. No such problem with other backends

Read more comments on GitHub >

github_iconTop Results From Across the Web

Strange execution patterns with subprocess.Popen
I have to wait for the first two processes to finish so I use Popen() and the final one I can let it...
Read more >
Issue 1068268: subprocess is not EINTR-safe - Python tracker
The subprocess module is not safe for use with signals, because it doesn't retry the system calls upon EINTR. However, as far as...
Read more >
Avoiding Windows backslash problems with Python's raw strings
Avoiding Windows backslash problems with Python's raw strings ... Remember that strings in Python normally contain characters.
Read more >
Embarrassingly parallel for loops - Joblib - Read the Docs
The main issue with this solution is that using fork to start the process breaks the standard POSIX and can have weird interaction...
Read more >
Netplan try / apply breaks after apt update [Archive]
I've got a strange issue that I only get on a vm and not on any ... File "/usr/lib/python3.8/subprocess.py", line 364, in check_call...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found