Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Data loss during Table.add_row 'int64' converted to 'float64'

See original GitHub issue

Load a csv file as a table in astropy, first column is a source ID, which has an int64 type. Then when creating another table from that table using a for loop and add_row, the created table with the source ID now has a float64 type instead of int64. Then cast it astype(np.int64) and the last 3 digits are wrong. I can re-fashion my code to avoid for now but just sharing. I think this is similar to #7741 with the fix in #7747, or perhaps similar to #5950.

import numpy as np
from astopy.table import Table

t = Table.read('file.csv')
t2 = Table(names=t.colnames)

for row in t:
    t2.add_row(row)
t2  # the source ID column is now float64
t2['source_id'] = t2['source_id'].astype(np.int64)
t2  # the source ID column is int64 but wrong

e.g. 5278042881477383040 becomes 5278042881477383656 (the last 3 digits always seem to be 656 as well)

System Details: Linux-5.0.0-31-generic-x86_64-with-debian-buster-sid Python 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] Numpy 1.17.2 Scipy 1.2.1 astropy 3.1.2

Issue Analytics

State:
Created 4 years ago
Comments:11 (8 by maintainers)

Top GitHub Comments

1reaction

Will-Coopercommented, Oct 15, 2019

Hi, Yes found a quick workaround using Ecsv for when there are lots of columns (or with the attached example of only 1 column test_astropy.txt) and you know what datatype each column should have:

import numpy as np
from astropy.table import Table

t = Table.read('test_astropy.txt', format='ascii')  # one column table of just source ID with int64 dtype

ecsv_content = '''# %ECSV 0.9
# ---
# datatype:
'''
cols = ''
for i in t.colnames:
    cols += i + ' '
    if i == 'source_id': 
        ecsv_content += f'# - {{name: {i}, datatype: int64}}\n' 
    else: 
        ecsv_content += f'# - {{name: {i}, datatype: float64}}\n' 
ecsv_content += cols

t2 = Table.read(ecsv_content, format='ascii.ecsv')  # empty one column table of source ID with int64 dtype

for row in t:
    t2.add_row(row)
t2  # full column table of source ID with int64 type and correct values

Thanks! Perhaps a warning or the option to do this in the way I was originally attempting? It’s only really an issue when there are hundreds of columns with non-float dtypes which makes the second table creation awkward; especially as the int64 -> float64 type conversion doesn’t throw an error.

1reaction

Will-Coopercommented, Oct 15, 2019

Hi, Now using astropy version 4.0.dev26091 (apologies I am new to github) Actually found in the process of making the attached, sanitised example test_astropy.txt that it’s the defaulting of the column dtype to float64, which is then not being updated to int64 as soon as add_row is called? I don’t know a way of making it default to int64 without passing it data or a list of dtypes (which is awkward when there are hundreds of columns).

import numpy as np
from astropy.table import Table
t = Table.read('test_astropy.txt', format='ascii.no_header')  # the input file type does not matter here
t  # a one column table with the int64 dtype
t2 = Table(names=['source_id']) 
t2  # this is one empty column with a dtype of float64
for row in t:
    t2.add_row(row)
t2  # this is one full column still with dtype of float64
t2['source_id'] = t2['source_id'].astype(np.int64)
t2  # one column table with int64 dtype but the wrong last 3 digits

Also realised that if the table had contained a column of strings, this exact technique throws a ValueError as it tries to convert string to float. Perhaps this method of table creation using just names, then feeding it rows from other tables is not intended? The first call to add_row would have to update the dtype of the recipient table, if recipient is empty (if not empty the resulting type conversion related errors are fair enough).

Top Results From Across the Web

It it safe to convert from int64 to float64? - Stack Overflow

Depends on what you mean by "safe". Yes, precision can be lost here in some cases. float64 cannot represent all values of int64...

Data manipulation language (DML) statements in Google ...

The BigQuery data manipulation language (DML) enables you to update, insert, and delete data from your BigQuery tables. For information about how to...

Formats for Input and Output Data | ClickHouse Docs

Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost). $ clickhouse-client --format_csv_delimiter="|" --query=" ...

problems when converting data from float64 to int64 in python ...

I was preparing data for testing one of the training models, but the key attribute "rating" is in float64 format, but the cross...

ray.data.dataset — Ray 2.2.0 - the Ray documentation

Table ``, or Python list. The block also determines the unit of parallelism. Datasets can be created in multiple ways: from synthetic data...