Resource leak in multiprocessing
See original GitHub issueHi, this is a follow-up to https://github.com/libvips/pyvips/issues/73
I implemented my patch extraction based on a process pool, as discussed there. The processes each create a VIPS image from the path (to a large file) and then extract patches from it.
def process_job(i, large_file_path):
vips_img = pyvips.Image.new_from_file(large_file_path)
# extract stuff from it
area = vips_img.extract_area(x, y, width, height)
a = np.ndarray(area.write_to_buffer(), ...)
# ...
iterable = ((i, 'large_image.tif') for i in range(1000))
pool.imap_unordered(process_job, iterable)
This kind of function is actually called many times for each image, because I’m using the pool to submit jobs to the worker processes, which only extract a few patches at a time.
Now, after running this for about ~25 different images, the extraction silently fails. Basically, the array just gets filled with zeros or random values, probably because somehow the underlying buffer is not allocated. The problem is, it’s possible for me to reproduce, but it takes about an hour, and I haven’t found a way to shorten that (just extracting from many images works fine, it seems like it’s only an issue when calling this job function many many times as well).
This makes me think that this is a problem with synchronization, but I can’t really figure out why and the fact that I can reproduce it is strange as well (it fails almost at the same point every time).
So maybe it’s an issue with certain resources not being free’d as they should ?
I’ve looked into the debug logs (set the python logging to debug) and they are very long ofc but nothing really strange seems to happen at the point where the extraction silently fails. The only thing I can use to tell that it fails at all is that every time the job is called, a warning is issued (no resolution info for TIFF image …), which I think has been removed in recent versions, whereas for the previous images, this warning would only be issued once per process starting. So something must be different in the underlying access to the images, but I can’t figure out what.
If you have a suggestion of how to better my program (possibly sharing the VIPS images as a global inherited by the processes instead of sending the path and creating it every time, etc.), I would be grateful, too, of course !
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Thank you for the advice, everything works as expected when starting vips in the child processes only, so I will close this issue.
Oop, accidental close.