question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pandas casting int64 to float64, misrepresenting value

See original GitHub issue

I have the following data being returned by Presto (single column, 6 rows):

[(None,), (1239162456494753670,), (None,), (None,), (None,), (None,)

Due to the missing data (None), Pandas infers the type as float64, converting the value to a wrong id:

>>> column_names = ['organization_lyft_id']
>>> data = [(None,), (1239162456494753670,), (None,), (None,), (None,), (None,)]
>>> df = pd.DataFrame(list(data), columns=column_names).infer_objects()  # SupersetDataFrame
>>> print(df)
   organization_lyft_id
0                   NaN
1          1.239162e+18
2                   NaN
3                   NaN
4                   NaN
5                   NaN
>>> print(df.dtypes)
organization_lyft_id    float64
dtype: object

The number then shows up as 1239162456494753800 in SQL Lab.

Here’s the Pandas documentation on this:

… pandas primarily uses NaN to represent missing data. Because NaN is a float, this forces an array of integers with any missing values to become floating point. In some cases, this may not matter much. But if your integer column is, say, an identifier, casting to float can be problematic. Some integers cannot even be represented as floating point numbers. (emphasis mine)

Note that if the missing data is filtered the value is inferred as an int64, and it shows up correctly in SQL Lab:

>>> column_names = ['organization_lyft_id']
>>> data = [(1239162456494753670,)]
>>> df = pd.DataFrame(list(data), columns=column_names).infer_objects()  # SupersetDataFrame
>>> print(df)
   organization_lyft_id
0   1239162456494753670
>>> print(df.dtypes)
organization_lyft_id    int64
dtype: object

The solution is to pass a dtype argument when creating the Pandas data frame, built from the cursor description. I’m working on a fix for this.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
issue-label-bot[bot]commented, Sep 14, 2019

Issue-Label Bot is automatically applying the label #bug to this issue, with a confidence of 0.57. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

0reactions
john-bodleycommented, Dec 3, 2019

We noticed an issue with the the Numpy reshaping logic in SQL Lab. Here labels is an ARRAY<STRING> and renders correctly if multiple columns are selected but it incorrectly reshaped if it’s the only column.

Screen Shot 2019-12-03 at 9 46 23 AM

Screen Shot 2019-12-03 at 9 45 49 AM

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas casting int64 to float64, misrepresenting value #8225
The solution is to pass a dtype argument when creating the Pandas data frame, built from the cursor description. I'm working on a...
Read more >
convert pandas dataframe datatypes from float64 into int64
I am trying to read CSV file by using python pandas, in the resultant dataframe one column is returned as float64 datatype instead...
Read more >
How to Convert Integers to Floats in Pandas DataFrame?
In the above example, we change the data type of columns 'Age' and 'Strike_rate' from 'int64' to 'float64'. Method 2: Using pandas.to_numeric() ...
Read more >
Pandas Convert Column to Int in DataFrame
Use pandas DataFrame.astype(int) and DataFrame.apply() methods to convert a column ... is string/object hilding integer value and Discount is float64 type.
Read more >
10 tricks for converting Data to a Numeric Type in Pandas
In Pandas, missing values are given the value NaN , short for “Not a Number”. For technical reasons, these NaN values are always...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found