All masked columns are incompatible with writing to votable
See original GitHub issueastropy: 3.2.1 numpy 1.16.4
I think the following snippet used to work fine although I cannot pinpoint the version that worked.
from astroquery.gaia import Gaia
r = Gaia.launch_job("select top 1000 * from gaiadr2.gaia_source;").get_results()
r.write('test.vot', format='votable', overwrite=True)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/projects/notebook/problem.py in <module>
5 from astroquery.gaia import Gaia
6 r = Gaia.launch_job("select top 1000 * from gaiadr2.gaia_source;").get_results()
----> 7 r.write('test.vot', format='votable', overwrite=True)
8 print(r['epoch_photometry_url'])
9
~/miniconda3/lib/python3.7/site-packages/astropy/table/connect.py in __call__(self, *args, **kwargs)
112 instance = self._instance
113 with serialize_method_as(instance, serialize_method):
--> 114 registry.write(instance, *args, **kwargs)
~/miniconda3/lib/python3.7/site-packages/astropy/io/registry.py in write(data, format, *args, **kwargs)
564
565 writer = get_writer(format, data.__class__)
--> 566 writer(data, *args, **kwargs)
567
568
~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/connect.py in write_table_votable(input, output, table_id, overwrite, tabledata_format)
155
156 # Create a new VOTable file
--> 157 table_file = from_table(input, table_id=table_id)
158
159 # Write out file
~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/table.py in from_table(table, table_id)
332 votable : `~astropy.io.votable.tree.VOTableFile` instance
333 """
--> 334 return tree.VOTableFile.from_table(table, table_id=table_id)
335
336
~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py in from_table(cls, table, table_id)
3632 votable_file = cls()
3633 resource = Resource()
-> 3634 votable = Table.from_table(votable_file, table)
3635 if table_id is not None:
3636 votable.ID = table_id
~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py in from_table(cls, votable, table)
2891 for colname in table.colnames:
2892 column = table[colname]
-> 2893 new_table.fields.append(Field.from_table_column(votable, column))
2894
2895 if table.mask is None:
~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/tree.py in from_table_column(cls, votable, column)
1552 kwargs['unit'] = column.info.unit
1553 kwargs['name'] = column.info.name
-> 1554 result = converters.table_column_to_votable_datatype(column)
1555 kwargs.update(result)
1556
~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/converters.py in table_column_to_votable_datatype(column)
1422 return {'datatype': 'unicodeChar', 'arraysize': '*'}
1423 elif isinstance(column[0], np.ndarray):
-> 1424 dtype, shape = _all_matching_dtype(column)
1425 if dtype is not False:
1426 result = numpy_to_votable_dtype(dtype, shape)
~/miniconda3/lib/python3.7/site-packages/astropy/io/votable/converters.py in _all_matching_dtype(column)
1330 first_shape = ()
1331 for x in column:
-> 1332 if not isinstance(x, np.ndarray) or len(x) == 0:
1333 continue
1334
TypeError: len() of unsized object
This is choking on ‘epoch_photometry_url’ column which is all masked with empty object.
In [6]: r['epoch_photometry_url']
Out[6]:
<MaskedColumn name='epoch_photometry_url' dtype='object' description='epoch photometry url' length=1000>
--
--
--
--
--
--
--
--
--
--
--
--
--
--
Reproducing the gist of the problem:
from astropy.table import Table, MaskedColumn
# All masked, empty string as object: this fails.
# same case as in the above snippet.
try:
t = Table({'c1':MaskedColumn(
data=np.array([''.encode()]*5).astype(object),
mask=np.ones(5).astype(bool))})
t.write('test1.vot', format='votable', overwrite=True)
except TypeError:
print("TypeError raised.")
try:
# All masked, non-empty string as object: this fails.
t = Table({'c1':MaskedColumn(
data=np.array(['abc'.encode()]*5).astype(object),
mask=np.ones(5).astype(bool))})
t.write('test1.vot', format='votable', overwrite=True)
except TypeError:
print("TypeError raised.")
# Partially masked, empty string as object: this works.
t = Table({'c1':MaskedColumn(
data=np.array([''.encode()]*5).astype(object),
mask=np.array([0,0,0,1,1]).astype(bool))})
t.write('test1.vot', format='votable', overwrite=True)
# All unmasked, empty string as object: this works.
t = Table({'c1':MaskedColumn(
data=np.array([''.encode()]*5).astype(object),
mask=np.zeros(5).astype(bool))})
t.write('test1.vot', format='votable', overwrite=True)
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
astropy.io.votable.exceptions — Astropy v5.1.1
Bit values do not support masking. This warning is raised upon setting masked data in a bit column. References: 1.1, 1.2. W40: 'cprojection'...
Read more >astropy.io.votable.exceptions — Astropy v0.4.2
Ignoring; W11: The gref attribute on LINK is deprecated in VOTable 1.1 ... This warning is raised upon setting masked data in a...
Read more >Introduction to dynamic data masking | BigQuery - Google Cloud
Not compatible. To copy a table from source to the destination, you need to to have full access to all of the columns...
Read more >Pyvo TAP result to csv (via astropy.Table?)
This confuses me because first of all, the Table is not masked ... That said, I don't know why a masked column wouldn't...
Read more >python-astropy-2.0.3-bp150.1.3 - SUSE Package Hub -
Fixed a problem in comparing masked columns in bytes and unicode when the unicode had ... Ensured that all tests use the Astropy...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The issue appears to be with the handling of null string values. In the example given in the original report (trimmed to only get a couple rows):
the epoch_photometry_url column, which should contain strings, has values that are numpy.ma.core.MaskedConstant with dtype = float64 and shape = (). Those values trip up the writer logic which assumes that string values will have a length.
The table has those masked values due to a quirk in the votable reader, which is used under the hood by astroquery.gaia to read the results from the server. In this case, the data provider marked the epoch_photometry_url values as null by using the null mask capability of BINARY2 serialization. The reader then populates the astropy Table with masked values for any “null” value from the votable. This behavior is appropriate for numeric FIELDs, but arguably not for strings.
This problem affects only astropy tables created from VOTables that use BINARY2 serialization, have some empty string values, and choose to mark them as null. Prior to VOTable 1.3, there was no way to specify a null string, i.e., there was no way to distinguish an empty string from a null string. (By string, I mean a FIELD with datatype=“char” and arraysize=[a number or *].) With VOTable 1.3, the BINARY2 serialization option was added specifically to provide a mask bit for each cell to indicate whether or not the cell is a null value. The other serializations still don’t allow distinguishing between null and empty string.
I don’t see any code reason why this would be a new problem. More likely the data provider changed how the results were being sent.
One option to address this would be to harden the writer so that it can deal with these masked values on output, writing an empty string. That’s OK, but doesn’t address the other uses of those masked values. I can’t think of any cases where a client would need or prefer masked values over empty strings, and can think of many cases where it complicates things.
Since clients already have to code for the possibility of empty strings, I think it makes sense to keep consistent behavior across all serializations, representing these values as empty strings in the astropy tables.
I’ll submit a PR that implements that suggestion.
I was uncomfortable going as far as doing away with the MaskedColumn entirely. They are used pervasively regardless of what serialization format was used, and I haven’t fully learned why yet. Instead of risking wider consequences for now, I’ll just prevent those string MaskedColumns from having masked values.