Bug in creating a fits Column
See original GitHub issueDescription
In processing an extremely large amount of data, my code attempts to create a fits.Column
object from that data which fails in a numpy call made by astropy.io.fits
.
A simplified reproducer of the issue is
import numpy as np
from astropy.io import fits
dat = np.zeros(2150000000)
fmt = f"{len(dat)}B"
column = fits.Column(array=dat, format=fmt)
Which produces the error:
Traceback (most recent call last):
File "/Users/wjamieson/projects/astropy/test.py", line 7, in <module>
column = fits.Column(array=dat, format=fmt)
File "/Users/wjamieson/projects/astropy/astropy/io/fits/column.py", line 663, in __init__
array = self._convert_to_valid_data_type(array)
File "/Users/wjamieson/projects/astropy/astropy/io/fits/column.py", line 1341, in _convert_to_valid_data_type
dtype = np.dtype(numpy_format).base
ValueError: invalid shape in fixed-type tuple: dimension does not fit into a C int.
The part where astropy is failing is here: https://github.com/astropy/astropy/blob/c375a43e8878d304d22381cac7bce4947c23ae25/astropy/io/fits/column.py#L1318-L1341
Note that if dat
is slightly smaller this succeeds, but larger dat
cause the failure.
Diving into the astropy code shows that essentially, my inputs are resulting in the numpy call:
np.dtype("=2150000000u1")
Which produces the error. However, if we run (remove one zero from the coefficient)
np.dtype("=215000000u1")
it succeeds. Moreover, the astropy code immediately calls .base
on the result of the above, which for the purposes of my code should always result in uint8
. Indeed, this is reflected in the comment: https://github.com/astropy/astropy/blob/c375a43e8878d304d22381cac7bce4947c23ae25/astropy/io/fits/column.py#L1338-L1340
This makes me believe that there is no need to preserve the shape coefficient in the numpy_format
string passed to np.dtype
in this code. Meaning, the above should legitimately succeed.
System Details
macOS-10.16-x86_64-i386-64bit
Python 3.10.5 (main, Jul 18 2022, 12:56:28) [Clang 12.0.0 (clang-1200.0.32.29)]
Numpy 1.23.1
pyerfa 2.0.0.1
astropy 5.1
Scipy 1.8.1
Matplotlib 3.5.1
Issue Analytics
- State:
- Created a year ago
- Comments:10 (10 by maintainers)
Top GitHub Comments
Are you sure this is what you want ? By using
format=f"{len(dat)}B"
you are specifying that each element of the column is an array of 2150000000 values. Which is not possible because of a fundamental limitation in Numpy (same issue as #1840). Also here you are trying to build a column of N element where each element would be an array of N element, even when removing a zero I doubt you will have enough memory:I don’t know what the asdf code is code so difficult to be more specific, but maybe we discuss that on Slack ?