astropy.io.fits not returning memory
See original GitHub issueFirst, I’m sorry I know nothing about memory… I am using macOS 10.14.5 on MacBook Pro 15" 2018.
While I was reducing 150 FITS files (each ~ 70MB), I realized the pipeline I used got ~2-3 times slower as time goes. I’ve been experiencing tens of such cases, but couldn’t find any possiblities… Then 2 weeks ago, just by chance, I was looking at the memory usage, and found that the memory was going “full”.
I asked one of my colleagues about it and the friend actually has experienced it almost everyday but couldn’t find any other solutions but to use except OSError: gc.collect()
. The friend uses Windows and LINUX. But it seemed like my mac does not give any OSError
, so it did not help. Also gc.collect()
at the end of the for loop did not return memory.
Then I found from astropy faq that I should do del hdu.data
and gc.collect()
. But as I mentioned above, the following simple code didn’t seem to return any memory.
from pathlib import Path
from astropy.io import fits
import gc
TOPPATH = Path('./reduced')
allfits = list(TOPPATH.glob("*.fits"))
allfits.sort()
for fpath in allfits:
hdul = fits.open(fpath)
data = hdul[0].data
test = data + 1
hdul.close()
del data
del test
del hdul[0].data
# gc.collect() # tested putting it at many different places of the code
or following exactly the same as what is in the faq:
for fpath in allfits:
with fits.open(fpath) as hdul:
for hdu in hdul:
data = hdu.data
test = data + 1
del data
del hdu.data
del test
When I run these, memory usage quickly increased from 40% to 100%. Because the above faq says “In some extreme cases files are opened and closed fast enough”, so I put time.sleep(xxx)
in the middle of the for loop for testing, but it didn’t help. I tried resetting the variables as None
(test=None
, data=None
, hdul[0].data=None
) at the end of the loop, as well as using CCDData.read
instead of fits.open
, etc, but found no hope.
Just for temporary usage, thus, I let my mac to do the job without any gc.collect()
or del
, because it worked although it’s painfully slow. Then to compare the differences between two algorithms I developed, I had to run similar reduction pipeline for the identical files, so I used two Jupyter Notebook kernels for running the two nearly identical pipelines at the same time. Since the number of files to be processed (or the memory usage) got doubled, it got much slower as time goes. Then suddenly it gave “too many files open” error in the middle of the processing, and Jupyter Notebook, Jupyter Lab, etc never worked again. I spent maybe 3-4 hours of googling, got hopeless, and I had to clean install Anaconda after removing it… It took almost a full working day for me…
I don’t know what happened with Jupyter, but this shock was strong enough for me to become… cautious. As I don’t know much about computer, I can’t imagine what the fundamental problem is even after reading the above faq. I believe there should be a way to solve this issue, but it’s maybe just far beyond my ability. I hope a workaround is explicitly available to astropy users so that people like me become more careful.
Issue Analytics
- State:
- Created 4 years ago
- Comments:31 (23 by maintainers)
Top GitHub Comments
It is not doing nothing. The “main” file handle is closed, but not the memmap one. If the references are deleted before
.close
then the memmap is closed as well.Oh sorry I confused a bit.
memmap=False
solved the original issue I was having. I was confused with a different one I was having. Sorry.Closing the issue.