question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in creating a fits Column

See original GitHub issue

Description

In processing an extremely large amount of data, my code attempts to create a fits.Column object from that data which fails in a numpy call made by astropy.io.fits.

A simplified reproducer of the issue is

import numpy as np
from astropy.io import fits

dat = np.zeros(2150000000)
fmt = f"{len(dat)}B"
column = fits.Column(array=dat, format=fmt)

Which produces the error:

Traceback (most recent call last):
  File "/Users/wjamieson/projects/astropy/test.py", line 7, in <module>
    column = fits.Column(array=dat, format=fmt)
  File "/Users/wjamieson/projects/astropy/astropy/io/fits/column.py", line 663, in __init__
    array = self._convert_to_valid_data_type(array)
  File "/Users/wjamieson/projects/astropy/astropy/io/fits/column.py", line 1341, in _convert_to_valid_data_type
    dtype = np.dtype(numpy_format).base
ValueError: invalid shape in fixed-type tuple: dimension does not fit into a C int.

The part where astropy is failing is here: https://github.com/astropy/astropy/blob/c375a43e8878d304d22381cac7bce4947c23ae25/astropy/io/fits/column.py#L1318-L1341

Note that if dat is slightly smaller this succeeds, but larger dat cause the failure.

Diving into the astropy code shows that essentially, my inputs are resulting in the numpy call:

np.dtype("=2150000000u1")

Which produces the error. However, if we run (remove one zero from the coefficient)

np.dtype("=215000000u1")

it succeeds. Moreover, the astropy code immediately calls .base on the result of the above, which for the purposes of my code should always result in uint8. Indeed, this is reflected in the comment: https://github.com/astropy/astropy/blob/c375a43e8878d304d22381cac7bce4947c23ae25/astropy/io/fits/column.py#L1338-L1340

This makes me believe that there is no need to preserve the shape coefficient in the numpy_format string passed to np.dtype in this code. Meaning, the above should legitimately succeed.

System Details

macOS-10.16-x86_64-i386-64bit
Python 3.10.5 (main, Jul 18 2022, 12:56:28) [Clang 12.0.0 (clang-1200.0.32.29)]
Numpy 1.23.1
pyerfa 2.0.0.1
astropy 5.1
Scipy 1.8.1
Matplotlib 3.5.1

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
saimncommented, Sep 28, 2022
fmt = f"{len(dat)}B"
column = fits.Column(array=dat, format=fmt)

Are you sure this is what you want ? By using format=f"{len(dat)}B" you are specifying that each element of the column is an array of 2150000000 values. Which is not possible because of a fundamental limitation in Numpy (same issue as #1840). Also here you are trying to build a column of N element where each element would be an array of N element, even when removing a zero I doubt you will have enough memory:

In [4]: import numpy as np
   ...: from astropy.io import fits
   ...: 
   ...: dat = np.zeros(215000000)
   ...: fmt = f"{len(dat)}B"
   ...: column = fits.Column(array=dat, format=fmt, name='a')

In [5]: hdu = fits.BinTableHDU.from_columns([column])
...
MemoryError: Unable to allocate 41.1 PiB for an array with shape (46225000000000000,) and data type uint8
0reactions
saimncommented, Sep 29, 2022

I don’t know what the asdf code is code so difficult to be more specific, but maybe we discuss that on Slack ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ubuntu Manpage: fits-column-merge -- Create a FITS binary table ...
Provided by: astrometry.net_0.67+dfsg-1_amd64 · bug. NAME. fits-column-merge -- Create a FITS binary table that includes columns from two input tables.
Read more >
pyfits · PyPI
Reads FITS images and tables into numpy or numarray objects ... Fixed a bug where creating a table column from an array in...
Read more >
MRDFITS - L3HarrisGeospatial.com
ALIAS The keyword allows the user to specify the column names to be created when reading FITS data. The value of this keyword...
Read more >
Bugs: Data Model - CIAO 4.15 - Chandra X-ray Center
Creating a vector on-the-fly when region filtering ... Using one column from a vector column in region filter. Filtering on array columns is ......
Read more >
astropy.io.fits FAQ
How do I create a multi-extension FITS file from scratch? ... message in an exception appears, the more likely that it was caused...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found