Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ECSV array/object-valued column support

See original GitHub issue

The ECSV format is, as far as I know, defined by APE6. That lists the permitted datatypes as bool, int8, int16, int32, int64, uint8, uint16, uint32, uint64, float16, float32, float64, float128, complex64, complex128, complex256, and string, and it says on the topic of array-valued columns (as it calls them “Multidimensional columns”):

Multidimensional columns are not supported in version 0.9 of the ECSV format.

None of the available text data formats supports multidimensional columns with more than one element per row. Although in many cases having such data would indicate using a binary storage format, there is utility in supporting this for cases where the column shape is “reasonable”, perhaps with no more than about 10 elements.

(parenthetically: VOTable listed as one of the comparison formats earlier in the document does support array-valued columns, though it’s not really a text format)

However, if you ask astropy.table to write a table with array-like columns (lists, tuples, numpy arrays) to ECSV format, it will output ECSV tables using the undocumented datatype object. Such columns produce string values when read back in. See the following code:

from astropy.table import Table

# Create table column data
column_float  = [10., 20., 30.]
column_string = ['xx', 'yy', 'zz']
column_list   = [[2.0, 3.0, 4.0], [5.0], [8.5, 1.1]]

# Create Astropy table and get info
table = Table([column_float, column_string, column_list],
              names=('num_col', 'txt_col', 'list_col'),)

# Write ecsv table with multiple elements per cell
table.write('tsimp.ecsv', format='ascii.ecsv', overwrite=True, delimiter=',')

# Read it back in
tr = Table.read('tsimp.ecsv', format='ascii.ecsv')
print([type(c) for c in tr[0]])
print(tr)

which writes the following output:

[<class 'numpy.float64'>, <class 'numpy.str_'>, <class 'str'>]
num_col txt_col     list_col   
------- ------- ---------------
   10.0      xx [2.0, 3.0, 4.0]
   20.0      yy           [5.0]
   30.0      zz      [8.5, 1.1]

and produces the following ECSV file:

# %ECSV 0.9
# ---
# datatype:
# - {name: num_col, datatype: float64}
# - {name: txt_col, datatype: string}
# - {name: list_col, datatype: object}
# delimiter: ','
# schema: astropy-2.0
num_col,txt_col,list_col
10.0,xx,"[2.0, 3.0, 4.0]"
20.0,yy,[5.0]
30.0,zz,"[8.5, 1.1]"

Is this intended behaviour? At present my (java STIL/TOPCAT) ECSV reader fails to read such tables because they have an unknown datatype object. I’d say that’s an ECSV output bug given the documentation at APE6.

However, as it happens this got noticed during some discussion within DPAC (the Gaia data processing consortium) about whether array values could be stored in ECSV tables, which is an enhancement requested by some DPAC members for use with both Astropy and TOPCAT. The suggestion was that the existing behaviour could be extended with some additional metadata (new datatype or array flag) to provide array-valued column support in ECSV.

On a related topic, I see #11155, which is pursuing a different approach to serializing array values into ECSV. That may not be suitable for the Gaia use case, since (a) it only supports fixed-size array values and (b) from the metadata in the example file it seems to be intended as an astropy-specific convention.

So: fix object datatype output? Extend it to support array-valued columns? Or should I just patch my ECSV parser to cope with undocumented datatypes?

Issue Analytics

State:
Created 3 years ago
Comments:25 (25 by maintainers)

Top GitHub Comments

1reaction

taldcroftcommented, Apr 11, 2021

@nstarman - Thanks! Right now my plan is to get to this for the 4.3 release but I’ll let you know by this week if that seems unrealistic and I might need some help. If you are looking for a useful challenge, another set of eyes on #11127 would be great. 😄

0reactions

taldcroftcommented, Apr 17, 2021

See #11569.