question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

astropy.io.fits not returning memory

See original GitHub issue

First, I’m sorry I know nothing about memory… I am using macOS 10.14.5 on MacBook Pro 15" 2018.

While I was reducing 150 FITS files (each ~ 70MB), I realized the pipeline I used got ~2-3 times slower as time goes. I’ve been experiencing tens of such cases, but couldn’t find any possiblities… Then 2 weeks ago, just by chance, I was looking at the memory usage, and found that the memory was going “full”.

I asked one of my colleagues about it and the friend actually has experienced it almost everyday but couldn’t find any other solutions but to use except OSError: gc.collect(). The friend uses Windows and LINUX. But it seemed like my mac does not give any OSError, so it did not help. Also gc.collect() at the end of the for loop did not return memory.

Then I found from astropy faq that I should do del hdu.data and gc.collect(). But as I mentioned above, the following simple code didn’t seem to return any memory.

from pathlib import Path
from astropy.io import fits
import gc

TOPPATH = Path('./reduced')
allfits = list(TOPPATH.glob("*.fits"))
allfits.sort()

for fpath in allfits:
    hdul = fits.open(fpath)
    data = hdul[0].data
    test = data + 1
    hdul.close()
    del data
    del test
    del hdul[0].data
    # gc.collect()  # tested putting it at many different places of the code

or following exactly the same as what is in the faq:

for fpath in allfits:
    with fits.open(fpath) as hdul:
        for hdu in hdul:
            data = hdu.data        
            test = data + 1
            del data
            del hdu.data
            del test

When I run these, memory usage quickly increased from 40% to 100%. Because the above faq says “In some extreme cases files are opened and closed fast enough”, so I put time.sleep(xxx) in the middle of the for loop for testing, but it didn’t help. I tried resetting the variables as None (test=None, data=None, hdul[0].data=None) at the end of the loop, as well as using CCDData.read instead of fits.open, etc, but found no hope.

Just for temporary usage, thus, I let my mac to do the job without any gc.collect() or del, because it worked although it’s painfully slow. Then to compare the differences between two algorithms I developed, I had to run similar reduction pipeline for the identical files, so I used two Jupyter Notebook kernels for running the two nearly identical pipelines at the same time. Since the number of files to be processed (or the memory usage) got doubled, it got much slower as time goes. Then suddenly it gave “too many files open” error in the middle of the processing, and Jupyter Notebook, Jupyter Lab, etc never worked again. I spent maybe 3-4 hours of googling, got hopeless, and I had to clean install Anaconda after removing it… It took almost a full working day for me…

I don’t know what happened with Jupyter, but this shock was strong enough for me to become… cautious. As I don’t know much about computer, I can’t imagine what the fundamental problem is even after reading the above faq. I believe there should be a way to solve this issue, but it’s maybe just far beyond my ability. I hope a workaround is explicitly available to astropy users so that people like me become more careful.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:31 (23 by maintainers)

github_iconTop GitHub Comments

2reactions
saimncommented, Aug 5, 2019

The line hdul.close() in the code above is doing nothing, because you have open references to a memapped array, i.e. data.

It is not doing nothing. The “main” file handle is closed, but not the memmap one. If the references are deleted before .close then the memmap is closed as well.

1reaction
ysBachcommented, Feb 13, 2020

Oh sorry I confused a bit. memmap=False solved the original issue I was having. I was confused with a different one I was having. Sorry.

Closing the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

FITS File Handling (astropy.io.fits)
The open() function supports a memmap=True argument that allows the array data of each HDU to be accessed with mmap, rather than being...
Read more >
Support for FITS files larger than memory - Google Groups
If I open a FITS file with the memmap=True keyword, the memory map tries to allocate enough *virtual* memory for the FITS file,...
Read more >
FITS File handling (astropy.io.fits)
The data may or may not be accessible depending on whether the data are touched and if they are memory-mapped, see later chapters...
Read more >
astropy.io.fits read row from large fits file with mutliple HDUs
This code works well and fast for fits files up to ~20 GB, but runs out of memory for bigger ones (larger files...
Read more >
Source code for galsim.fits
However, with recent versions of # astropy for the fits I/O, the in memory version should always work. So we first # try...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found