Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add column to fits file without loading fits file into memory?

See original GitHub issue

In #6649 I brought up an issue when adding columns using the somewhat misleading add_cols() function. I received very helpful assistance, with the recommendation that I use fits.BinTableHDU.from_columns(f[1].columns + column) (which as an aside must actually be f[1].columns.columns, as f[1].columns returns a ColDefs object) or fits.BinTableHDU.from_columns(f[1].columns.add_col(column)) These functions seem to only work by loading the columns from f[1] into memory first, and then constructing a while new HDU table there.

I happen to be running astropy on a large enough dataset such that loading all of the columns into memory is actually not practical. When I run this code, my memory spikes as the columns are loaded and the program crashes:

  File "/scr/depot0/csh4/py/codes/ver1/corrset/builder.py", line 217, in add_column
    hdu[1] = fits.BinTableHDU.from_columns(hdu[1].columns.add_col(col))

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/hdu/table.py", line 126, in from_columns
    data = FITS_rec.from_columns(coldefs, nrows=nrows, fill=fill)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/fitsrec.py", line 330, in from_columns
    data = np.recarray(nrows, dtype=columns.dtype, buf=raw_data).view(cls)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/fitsrec.py", line 244, in __array_finalize__
    self._coldefs = ColDefs(self)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/column.py", line 1189, in __init__
    self._init_from_array(input)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/column.py", line 1249, in _init_from_array
    dim=dim)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/column.py", line 583, in __init__
    array = self._convert_to_valid_data_type(array)

  File "/scr/depot0/csh4/anaconda3/lib/python3.6/site-packages/astropy/io/fits/column.py", line 1100, in _convert_to_valid_data_type
    return np.where(array == 0, ord('F'), ord('T'))

MemoryError

Is there a way to modify an HDU table in place, without having to reload the whole thing? Thanks.

p.s. I can make a minimum working example on request, but it would vary from system to system. My current code works for a smaller data set, so I’m not sure an example is necessary. Perhaps a more useful diagnostic would be a timestamped plot of the RAM usage over time, which I can make if someone wants.

Other info: conda 4.3.25 astropy 2.0.2 python 3.6.1 numpy 1.13.1 spyder 3.2.1

My OS is Springdale Linux Release 6.9 (Pisa) GNOME 2.28.2 Kernel Linux 2.6.32-696.10.1.el6.x86_64 with 16 GB RAM

Small data set that worked just fine: 1.6 GB Large data set that had problems: 3.7 GB

Issue Analytics

State:
Created 6 years ago
Comments:10 (7 by maintainers)

Top GitHub Comments

1reaction

adrncommented, Oct 19, 2017

👋 Cassandra

I haven’t tested this, but one thought is to create a memory-mapped file with the right column structure but full of temporary / empty values, and then fill it with the values from the file you actually want. Traveling today but will try to come up with a minimal demo of what I mean. (I’ll be back in Peyton on Friday if you want to chat about this)

0reactions

astropy-bot[bot]commented, Apr 25, 2019

I’m going to close this issue as per my previous message, but if you feel that this issue should stay open, then feel free to re-open and remove the Close? label.

If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here