question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory file -> numpy -> memory file

See original GitHub issue

Thanks for good library for us. As mentioned in title I want to process video in raw byte to numpy then encode it into raw byte video. I succeed to read byte to mp4 with below comment.

decode = (
                ffmpeg
                .input('pipe:')
                .output('pipe:')
                .get_args()
            )
output_data = p.communicate(input=input_data)[0]

https://github.com/kkroening/ffmpeg-python/issues/49#issuecomment-355677082

It’s okay for don’t process anything. But I want to process like your tensorflow stream example

So I tried with two process but it’s doesn’t work and process. Below test is for video(byte) => numpy => video(file)

def start_encode_process():
    logger.info('Starting ffmpeg process1')
    args = (
        ffmpeg
        .input('pipe:')
        .output('pipe:', format='rawvideo', pix_fmt='rgb24')
        .compile()
    )
    return subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE)

def start_decode_process(out_filename, width, height):
    logger.info('Starting ffmpeg process2')
    args = (
        ffmpeg
        .input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(width, height))
        .output(out_filename, pix_fmt='yuv420p')
        .overwrite_output()
        .compile()
    )
    return subprocess.Popen(args, stdin=subprocess.PIPE)

process1.stdin.write(video_byte)
while True:
    in_frame = read_frame(process1, width, height)
    out_frame = process_frame_simple(in_frame)
    write_frame(process2, out_frame)

I think first problem is stdin. Above reading raw video byte use communicate method for stdin. But below case it’s not suit method cause need to process by each frame. Do you have any idea for this?

Thanks for reading.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

5reactions
kkroeningcommented, Dec 22, 2018

Because you’re reading and writing individual pipe file descriptors with a single python process, you’re encountering a deadlock: The process1.stdin.write line blocks and waits until process1 is finished reading all the data that’s written. Process1 reads a bit of data, does some work, then writes to its output stream. However, because nothing is reading from process1’s output stream process1 blocks and waits. So then both your python process and process1 are blocked, and no progress is made.

The reason it’s a problem in this example but fine in the tensorflow example is that each of the ffmpeg processes in the tensorflow example only uses one pipe, whereas process1 here has both an input and output pipe (and only a single python thread).

This is a common issue when working with blocking pipes, regardless of choice of language.

Here are a few options I can think of off the top of my head:

Option 1: Use threads (or python multiprocessing, gevent, etc) so that one thread is responsible for pumping data into process1, and another thread that’s responsible for pumping data out of process1 and into process2. Both threads must be okay with being blocked.

Option 2: Don’t use a stdin pipe for process1. If you don’t need to feed data to process1 from the same python process/thread that’s doing the in-memory numpy processing and can avoid doing so, then it gets a lot simpler because you avoid this deadlock scenario.

Option 3: Use non-blocking IO. Basically the process1.stdin.write gets replaced with a call that doesn’t block so that you avoid the deadlock. (Error handling and system-specific quirks can be really annoying here though, so YMMV; from my experience this ends up being the most complex/error-prone solution unless you have something like gevent do it for you, but in that case see option 2)

Option 4: Run only one ffmpeg process at a time and use subprocess.communicate. The subprocess communicate method gets around deadlock issues with running a child process with both stdin+stdout pipes, but you’ll need to have all the input data available beforehand, and then you’ll have to process the output of process1 all at once, meaning the entire thing has to fit in memory. The processed data is then fed into process2. If you only have a few seconds of video and don’t need it to run in realtime then this might be feasible (and pretty simple), otherwise it’s completely impractical.

Some related, potentially useful search terms / research topics:

  • “Unix pipe deadlocks”
  • “Subprocess communicate deadlock”
  • “Non-blocking IO python”
  • etc
1reaction
Sangkwuncommented, Dec 28, 2018

Sorry The frame rate was cause of my mistake. I read stout twice one was for test but i forgot to delete it. Anyway frame and framerate doesn’t have problem!

Read more comments on GitHub >

github_iconTop Results From Across the Web

numpy.memmap — NumPy v1.23 Manual
Memory -mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory.
Read more >
How to use numpy file without importing into RAM?
You can do this by opening your file as a memory-mapped array. For example: import sys import numpy as np # Create a...
Read more >
Find the memory size of a NumPy array - GeeksforGeeks
In this post, we will see how to find the memory size of a NumPy array. So for finding the memory size of...
Read more >
4.8. Processing large NumPy arrays with memory mapping
The array is stored in a file on the hard drive, and we create a memory-mapped object to this file that can be...
Read more >
In-Memory Files — rasterio documentation - Read the Docs
A GeoTIFF file in a sequence of data bytes can be opened in memory as shown below. from rasterio.io import MemoryFile with MemoryFile(data) ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found