question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add column to fits file without loading fits file into memory?

See original GitHub issue

In #6649 I brought up an issue when adding columns using the somewhat misleading add_cols() function. I received very helpful assistance, with the recommendation that I use fits.BinTableHDU.from_columns(f[1].columns + column) (which as an aside must actually be f[1].columns.columns, as f[1].columns returns a ColDefs object) or fits.BinTableHDU.from_columns(f[1].columns.add_col(column)) These functions seem to only work by loading the columns from f[1] into memory first, and then constructing a while new HDU table there.

I happen to be running astropy on a large enough dataset such that loading all of the columns into memory is actually not practical. When I run this code, my memory spikes as the columns are loaded and the program crashes:

  File "/scr/depot0/csh4/py/codes/ver1/corrset/builder.py", line 217, in add_column
    hdu[1] = fits.BinTableHDU.from_columns(hdu[1].columns.add_col(col))

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/hdu/table.py", line 126, in from_columns
    data = FITS_rec.from_columns(coldefs, nrows=nrows, fill=fill)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/fitsrec.py", line 330, in from_columns
    data = np.recarray(nrows, dtype=columns.dtype, buf=raw_data).view(cls)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/fitsrec.py", line 244, in __array_finalize__
    self._coldefs = ColDefs(self)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/column.py", line 1189, in __init__
    self._init_from_array(input)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/column.py", line 1249, in _init_from_array
    dim=dim)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/column.py", line 583, in __init__
    array = self._convert_to_valid_data_type(array)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/column.py", line 1100, in _convert_to_valid_data_type
    return np.where(array == 0, ord('F'), ord('T'))

MemoryError

Is there a way to modify an HDU table in place, without having to reload the whole thing? Thanks.

p.s. I can make a minimum working example on request, but it would vary from system to system. My current code works for a smaller data set, so I’m not sure an example is necessary. Perhaps a more useful diagnostic would be a timestamped plot of the RAM usage over time, which I can make if someone wants.

Other info: conda 4.3.25 astropy 2.0.2 python 3.6.1 numpy 1.13.1 spyder 3.2.1

My OS is Springdale Linux Release 6.9 (Pisa) GNOME 2.28.2 Kernel Linux 2.6.32-696.10.1.el6.x86_64 with 16 GB RAM

Small data set that worked just fine: 1.6 GB Large data set that had problems: 3.7 GB

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
adrncommented, Oct 19, 2017

👋 Cassandra

I haven’t tested this, but one thought is to create a memory-mapped file with the right column structure but full of temporary / empty values, and then fill it with the values from the file you actually want. Traveling today but will try to come up with a minimal demo of what I mean. (I’ll be back in Peyton on Friday if you want to chat about this)

0reactions
astropy-bot[bot]commented, Apr 25, 2019

I’m going to close this issue as per my previous message, but if you feel that this issue should stay open, then feel free to re-open and remove the Close? label.

If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here

Read more comments on GitHub >

github_iconTop Results From Across the Web

FITS File Handling (astropy.io.fits)
The open() function supports a memmap=True argument that allows the array data of each HDU to be accessed with mmap, rather than being...
Read more >
FITS File handling (astropy.io.fits)
The open() function supports a memmap=True argument that allows the array data of each HDU to be accessed with mmap, rather than being...
Read more >
Adding a new column to a FITS file via python - Stack Overflow
I want to add this array as the 11th column in an already existing FITS file that contains 10 columns. I am using...
Read more >
fitsTcl User's Guide - HEASARC
In fitsTcl, every FITS file is treated as a FitsFile object. ... Load a column of a table into memory and return address...
Read more >
Fastest way to read from large FITS file? - Google Groups
The file is a single table of ~1.8M rows. The fastest I've been able to import into the database is about 82 records/sec,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found