question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

All masked columns are incompatible with writing to votable

See original GitHub issue

astropy: 3.2.1 numpy 1.16.4

I think the following snippet used to work fine although I cannot pinpoint the version that worked.

from astroquery.gaia import Gaia
r = Gaia.launch_job("select top 1000 * from gaiadr2.gaia_source;").get_results()
r.write('test.vot', format='votable', overwrite=True)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/projects/notebook/problem.py in <module>
      5 from astroquery.gaia import Gaia
      6 r = Gaia.launch_job("select top 1000 * from gaiadr2.gaia_source;").get_results()
----> 7 r.write('test.vot', format='votable', overwrite=True)
      8 print(r['epoch_photometry_url'])
      9

~/miniconda3/lib/python3.7/site-packages/astropy/table/connect.py in __call__(self, *args, **kwargs)
    112         instance = self._instance
    113         with serialize_method_as(instance, serialize_method):
--> 114             registry.write(instance, *args, **kwargs)

~/miniconda3/lib/python3.7/site-packages/astropy/io/registry.py in write(data, format, *args, **kwargs)
    564
    565     writer = get_writer(format, data.__class__)
--> 566     writer(data, *args, **kwargs)
    567
    568

~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/connect.py in write_table_votable(input, output, table_id, overwrite, tabledata_format)
    155
    156     # Create a new VOTable file
--> 157     table_file = from_table(input, table_id=table_id)
    158
    159     # Write out file

~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/table.py in from_table(table, table_id)
    332     votable : `~astropy.io.votable.tree.VOTableFile` instance
    333     """
--> 334     return tree.VOTableFile.from_table(table, table_id=table_id)
    335
    336

~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py in from_table(cls, table, table_id)
   3632         votable_file = cls()
   3633         resource = Resource()
-> 3634         votable = Table.from_table(votable_file, table)
   3635         if table_id is not None:
   3636             votable.ID = table_id

~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py in from_table(cls, votable, table)
   2891         for colname in table.colnames:
   2892             column = table[colname]
-> 2893             new_table.fields.append(Field.from_table_column(votable, column))
   2894
   2895         if table.mask is None:

~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py in from_table_column(cls, votable, column)
   1552             kwargs['unit'] = column.info.unit
   1553         kwargs['name'] = column.info.name
-> 1554         result = converters.table_column_to_votable_datatype(column)
   1555         kwargs.update(result)
   1556

~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/converters.py in table_column_to_votable_datatype(column)
   1422                 return {'datatype': 'unicodeChar', 'arraysize': '*'}
   1423         elif isinstance(column[0], np.ndarray):
-> 1424             dtype, shape = _all_matching_dtype(column)
   1425             if dtype is not False:
   1426                 result = numpy_to_votable_dtype(dtype, shape)

~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/converters.py in _all_matching_dtype(column)
   1330     first_shape = ()
   1331     for x in column:
-> 1332         if not isinstance(x, np.ndarray) or len(x) == 0:
   1333             continue
   1334

TypeError: len() of unsized object

This is choking on ‘epoch_photometry_url’ column which is all masked with empty object.

In [6]: r['epoch_photometry_url']
Out[6]:
<MaskedColumn name='epoch_photometry_url' dtype='object' description='epoch photometry url' length=1000>
 --
 --
 --
 --
 --
 --
 --
 --
 --
 --
 --
 --
 --
 --

Reproducing the gist of the problem:

from astropy.table import Table, MaskedColumn

# All masked, empty string as object: this fails.
# same case as in the above snippet.
try:
    t = Table({'c1':MaskedColumn(
        data=np.array([''.encode()]*5).astype(object),
        mask=np.ones(5).astype(bool))})
    t.write('test1.vot', format='votable', overwrite=True)
except TypeError:
    print("TypeError raised.")

try:
    # All masked, non-empty string as object: this fails.
    t = Table({'c1':MaskedColumn(
        data=np.array(['abc'.encode()]*5).astype(object),
        mask=np.ones(5).astype(bool))})
    t.write('test1.vot', format='votable', overwrite=True)
except TypeError:
    print("TypeError raised.")


# Partially masked, empty string as object: this works.
t = Table({'c1':MaskedColumn(
    data=np.array([''.encode()]*5).astype(object),
    mask=np.array([0,0,0,1,1]).astype(bool))})
t.write('test1.vot', format='votable', overwrite=True)


# All unmasked, empty string as object: this works.
t = Table({'c1':MaskedColumn(
    data=np.array([''.encode()]*5).astype(object),
    mask=np.zeros(5).astype(bool))})
t.write('test1.vot', format='votable', overwrite=True)

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
tomdonaldsoncommented, Jul 25, 2019

The issue appears to be with the handling of null string values.  In the example given in the original report (trimmed to only get a couple rows):

from astroquery.gaia import Gaia
r = Gaia.launch_job("select top 2 * from gaiadr2.gaia_source;").get_results()
r.write('test.vot', format='votable', overwrite=True)

the epoch_photometry_url column, which should contain strings, has values that are numpy.ma.core.MaskedConstant with dtype = float64 and shape = ().  Those values trip up the writer logic which assumes that string values will have a length.

The table has those masked values due to a quirk in the votable reader, which is used under the hood by astroquery.gaia to read the results from the server.  In this case, the data provider marked the epoch_photometry_url values as null by using the null mask capability of BINARY2 serialization.  The reader then populates the astropy Table with masked values for any “null” value from the votable.  This behavior is appropriate for numeric FIELDs, but arguably not for strings.

This problem affects only astropy tables created from VOTables that use BINARY2 serialization, have some empty string values, and choose to mark them as null.  Prior to VOTable 1.3, there was no way to specify a null string, i.e., there was no way to distinguish an empty string from a null string.  (By string, I mean a FIELD with datatype=“char” and arraysize=[a number or *].)  With VOTable 1.3, the BINARY2 serialization option was added specifically to provide a mask bit for each cell to indicate whether or not the cell is a null value.  The other serializations still don’t allow distinguishing between null and empty string.

I don’t see any code reason why this would be a new problem. More likely the data provider changed how the results were being sent.

One option to address this would be to harden the writer so that it can deal with these masked values on output, writing an empty string.  That’s OK, but doesn’t address the other uses of those masked values.  I can’t think of any cases where a client would need or prefer masked values over empty strings, and can think of many cases where it complicates things.

Since clients already have to code for the possibility of empty strings, I think it makes sense to keep consistent behavior across all serializations, representing these values as empty strings in the astropy tables.

I’ll submit a PR that implements that suggestion.

1reaction
tomdonaldsoncommented, Jul 27, 2019

I was uncomfortable going as far as doing away with the MaskedColumn entirely. They are used pervasively regardless of what serialization format was used, and I haven’t fully learned why yet. Instead of risking wider consequences for now, I’ll just prevent those string MaskedColumns from having masked values.

Read more comments on GitHub >

github_iconTop Results From Across the Web

astropy.io.votable.exceptions — Astropy v5.1.1
Bit values do not support masking. This warning is raised upon setting masked data in a bit column. References: 1.1, 1.2. W40: 'cprojection'...
Read more >
astropy.io.votable.exceptions — Astropy v0.4.2
Ignoring; W11: The gref attribute on LINK is deprecated in VOTable 1.1 ... This warning is raised upon setting masked data in a...
Read more >
Introduction to dynamic data masking | BigQuery - Google Cloud
Not compatible. To copy a table from source to the destination, you need to to have full access to all of the columns...
Read more >
Pyvo TAP result to csv (via astropy.Table?)
This confuses me because first of all, the Table is not masked ... That said, I don't know why a masked column wouldn't...
Read more >
python-astropy-2.0.3-bp150.1.3 - SUSE Package Hub -
Fixed a problem in comparing masked columns in bytes and unicode when the unicode had ... Ensured that all tests use the Astropy...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found