question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ECSV reader fails on datatype=object column created with astropy < 4.3

See original GitHub issue

Description

Reading the following ECSV table

# %ECSV 0.9
# ---
# datatype:
# - {name: source, datatype: object}
# - {name: obstime, datatype: string}
# - {name: ra, datatype: float64}
# - {name: dec, datatype: float64}
# - {name: pa_v3, datatype: float64}
# schema: astropy-2.0
source obstime ra dec pa_v3
None 2016-01-18T15:25:23.776 133.3599835511599 -14.886980131752205 -161.3183607231063

which was written out by a recent (and current) release version of astropy now fails when read via the current astropy dev:

In [1]: import astropy

In [2]: print(astropy.__version__)
4.3.dev1662+gdf850cb7c

In [4]: from astropy.table import Table

In [5]: table = Table.read("jwst/lib/tests/data/v1_calc_truth.ecsv")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-2e4749b2facf> in <module>
----> 1 table = Table.read("jwst/lib/tests/data/v1_calc_truth.ecsv")

~/miniconda3/envs/dev/lib/python3.9/site-packages/astropy/table/connect.py in __call__(self, *args, **kwargs)
     59         descriptions = kwargs.pop('descriptions', None)
     60 
---> 61         out = registry.read(cls, *args, **kwargs)
     62 
     63         # For some readers (e.g., ascii.ecsv), the returned `out` class is not

~/miniconda3/envs/dev/lib/python3.9/site-packages/astropy/io/registry.py in read(cls, format, cache, *args, **kwargs)
    525 
    526         reader = get_reader(format, cls)
--> 527         data = reader(*args, **kwargs)
    528 
    529         if not isinstance(data, cls):

~/miniconda3/envs/dev/lib/python3.9/site-packages/astropy/io/ascii/connect.py in io_read(format, filename, **kwargs)
     16         format = re.sub(r'^ascii\.', '', format)
     17         kwargs['format'] = format
---> 18     return read(filename, **kwargs)
     19 
     20 

~/miniconda3/envs/dev/lib/python3.9/site-packages/astropy/io/ascii/ui.py in read(table, guess, **kwargs)
    367         else:
    368             reader = get_reader(**new_kwargs)
--> 369             dat = reader.read(table)
    370             _read_trace.append({'kwargs': copy.deepcopy(new_kwargs),
    371                                 'Reader': reader.__class__,

~/miniconda3/envs/dev/lib/python3.9/site-packages/astropy/io/ascii/core.py in read(self, table)
   1336 
   1337         # Get the table column definitions
-> 1338         self.header.get_cols(self.lines)
   1339 
   1340         # Make sure columns are valid

~/miniconda3/envs/dev/lib/python3.9/site-packages/astropy/io/ascii/ecsv.py in get_cols(self, lines)
    184             col.dtype = header_cols[col.name]['datatype']
    185             if col.dtype not in ECSV_DATATYPES:
--> 186                 raise ValueError(f'datatype {col.dtype!r} of column {col.name!r} '
    187                                  f'is not in allowed values {ECSV_DATATYPES}')
    188 

ValueError: datatype 'object' of column 'source' is not in allowed values ('bool', 'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', 'float16', 'float32', 'float64', 'float128', 'string')

Expected behavior

Under the release version (astropy 4.2.1) it reads in fine without issue.

This looks like an issue where numpy object types can no longer be serialized, and of course None is an object for numpy.

A workaround fix is to modify by hand the ECSV file header, changing the column dtype from object to string, and all works fine. But this is a backwards incompatibility in the ECSV reader, and will mean that many (some? few? 😂 ) tables written out by the current release of astropy will no longer be able to be read in in version 4.3.

Perhaps casting to string and raising a big DeprecationWarning might be better? Or something else? Clearly the general feature of serializing python objects is not possible in ECSV format, but clearly it was allowed before (and currently) in release versions.

The break seems to have happened in https://github.com/astropy/astropy/commit/e807dbff9a5c72bdc42d18c7d6712aae69a0bddc (EDIT: #11569)

System Details

>>> import platform; print(platform.platform())
macOS-10.15.7-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.9.5 (default, May 18 2021, 12:31:01) 
[Clang 10.0.0 ]
>>> import numpy; print("Numpy", numpy.__version__)
Numpy 1.20.3
>>> import astropy; print("astropy", astropy.__version__)
astropy 4.3.dev1662+gdf850cb7c
>>> import scipy; print("Scipy", scipy.__version__)
Scipy 1.6.3
>>> import matplotlib; print("Matplotlib", matplotlib.__version__)
Matplotlib 3.4.2

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
privongcommented, Aug 16, 2021

Hi all, I just wanted to chime in on this. I encountered this issue as well, but nothing jumped out at me from checking the astropy release notes. So I’d advocate for making this issue more visible to users.

As a complement to the workaround code provided by @taldcroft, this shell command worked for me to fix incorrectly written ecsv’s (tested on MacOS):

find . -name '*.ecsv' -print0 | xargs -0 sed -i '' -e 's/datatype: object/datatype: string/g'
1reaction
github-actions[bot]commented, Jun 1, 2021

This issue was labeled as Close?. Remove Close? label or this will be closed after 7 days. (This is currently a dry-run.)

Read more comments on GitHub >

github_iconTop Results From Across the Web

ECSV array/object-valued column support #11368 - GitHub
At present my (java STIL/TOPCAT) ECSV reader fails to read such tables because they have an unknown datatype object . I'd say that's...
Read more >
ECSV Format — Astropy v5.2
The format stores column specifications like unit and data type along with table metadata by using a YAML header data structure. The actual...
Read more >
Pyvo TAP result to csv (via astropy.Table?)
I.e., reading that csv into a table not only doesn't give the same structure (the new column is a string type), but it...
Read more >
Error while reading csv file: converting a column from string to ...
The error reads "can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, ...
Read more >
Advanced Python
Our job for this lesson is to parse (separate) these values into usable data. We use the delimiter characters in a CSV to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found