question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ECSV array/object-valued column support

See original GitHub issue

The ECSV format is, as far as I know, defined by APE6. That lists the permitted datatypes as bool, int8, int16, int32, int64, uint8, uint16, uint32, uint64, float16, float32, float64, float128, complex64, complex128, complex256, and string, and it says on the topic of array-valued columns (as it calls them “Multidimensional columns”):

Multidimensional columns are not supported in version 0.9 of the ECSV format.

None of the available text data formats supports multidimensional columns with more than one element per row. Although in many cases having such data would indicate using a binary storage format, there is utility in supporting this for cases where the column shape is “reasonable”, perhaps with no more than about 10 elements.

(parenthetically: VOTable listed as one of the comparison formats earlier in the document does support array-valued columns, though it’s not really a text format)

However, if you ask astropy.table to write a table with array-like columns (lists, tuples, numpy arrays) to ECSV format, it will output ECSV tables using the undocumented datatype object. Such columns produce string values when read back in. See the following code:

from astropy.table import Table

# Create table column data
column_float  = [10., 20., 30.]
column_string = ['xx', 'yy', 'zz']
column_list   = [[2.0, 3.0, 4.0], [5.0], [8.5, 1.1]]

# Create Astropy table and get info
table = Table([column_float, column_string, column_list],
              names=('num_col', 'txt_col', 'list_col'),)

# Write ecsv table with multiple elements per cell
table.write('tsimp.ecsv', format='ascii.ecsv', overwrite=True, delimiter=',')

# Read it back in
tr = Table.read('tsimp.ecsv', format='ascii.ecsv')
print([type(c) for c in tr[0]])
print(tr)

which writes the following output:

[<class 'numpy.float64'>, <class 'numpy.str_'>, <class 'str'>]
num_col txt_col     list_col   
------- ------- ---------------
   10.0      xx [2.0, 3.0, 4.0]
   20.0      yy           [5.0]
   30.0      zz      [8.5, 1.1]

and produces the following ECSV file:

# %ECSV 0.9
# ---
# datatype:
# - {name: num_col, datatype: float64}
# - {name: txt_col, datatype: string}
# - {name: list_col, datatype: object}
# delimiter: ','
# schema: astropy-2.0
num_col,txt_col,list_col
10.0,xx,"[2.0, 3.0, 4.0]"
20.0,yy,[5.0]
30.0,zz,"[8.5, 1.1]"

Is this intended behaviour? At present my (java STIL/TOPCAT) ECSV reader fails to read such tables because they have an unknown datatype object. I’d say that’s an ECSV output bug given the documentation at APE6.

However, as it happens this got noticed during some discussion within DPAC (the Gaia data processing consortium) about whether array values could be stored in ECSV tables, which is an enhancement requested by some DPAC members for use with both Astropy and TOPCAT. The suggestion was that the existing behaviour could be extended with some additional metadata (new datatype or array flag) to provide array-valued column support in ECSV.

On a related topic, I see #11155, which is pursuing a different approach to serializing array values into ECSV. That may not be suitable for the Gaia use case, since (a) it only supports fixed-size array values and (b) from the metadata in the example file it seems to be intended as an astropy-specific convention.

So: fix object datatype output? Extend it to support array-valued columns? Or should I just patch my ECSV parser to cope with undocumented datatypes?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:25 (25 by maintainers)

github_iconTop GitHub Comments

1reaction
taldcroftcommented, Apr 11, 2021

@nstarman - Thanks! Right now my plan is to get to this for the 4.3 release but I’ll let you know by this week if that seems unrealistic and I might need some help. If you are looking for a useful challenge, another set of eyes on #11127 would be great. 😄

0reactions
taldcroftcommented, Apr 17, 2021

See #11569.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to keep array values in the same column when ...
So, pretty much I would like to keep array values in the same column rather than separating them into different ones which shifts...
Read more >
Creating a CSV from an array inside of an array
Basically, this an array of "reports" (which will always be only one array) which contains an array of "columns" (the column headers) and...
Read more >
How to create a CSV file for an array of json objects?
I am able to get only one value from array for column test04 and ... CSV is kinda primitive type and does not...
Read more >
How to represent ARRAY types in CSV files
I have a table with several ARRAY columns. I've tried loading this table using JSON files, but I get this error: Unable to...
Read more >
Convert A CSV To A JavaScript Array of Objects
The simplest way to convert a CSV file into a JavaScript Array of Objects is, using the JavaScript split, map, forEach, trim methods, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found