question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

read_csv returns different float values for same number

See original GitHub issue

Code Sample, a copy-pastable example if possible

test.csv

-15.361
-15.361000
>>> import pandas as pd
>>> x = pd.read_csv('test.csv', header=None)
>>> x.loc[0, 0] == x.loc[1, 0]
False

Problem description / Expected output

The expected output of the code above is

>>> x.loc[0, 0] == x.loc[1, 0]
True

We should expect both -15.361 and -15.361000 to be converted to the same np.float64 representation. However, they are converted to different float values, differing in exactly the last bit of their floating point representation. For some reason, -15.361 gets converted incorrectly to 0xC02EB8D4FDF3B645 whereas -15.361000 is correctly to 0xC02EB8D4FDF3B646.

For completeness, here are some more comparisons x.loc[1, 0] is equal (==) to np.float64('-15.361'), np.float64('-15.361000'), and float('-15.361000'). x.loc[0, 0] is not equal to any of those.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 3.13.0-73-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.3 pytest: None pip: 9.0.1 setuptools: 36.2.4 Cython: None numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.9999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:2
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
chris-b1commented, Aug 2, 2017

Trying to get exact equality out of floating points is generally a losing battle, doubly so with a lossy format like csv - do use one of float_precision options if it’s important.

0reactions
chrisyeh96commented, Sep 14, 2020

Thanks @jreback! Glad this finally is resolved 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

Precision lost while using read_csv in pandas - Stack Overflow
Is there any recommendation in general for faster loading into data frame while using read_csv() when data is mostly floating point values. –...
Read more >
pandas.read_csv — pandas 0.19.1 documentation
Character to recognize as decimal point (e.g. use ',' for European data). float_precision : string, default None. Specifies which converter the C engine...
Read more >
dask.dataframe.read_csv - Dask documentation
If True, all integer columns that aren't specified in dtype are assumed to contain missing values, and are converted to floats. Default is...
Read more >
pandas: Cast DataFrame to a specific dtype with astype()
Specify data type ( dtype ) when reading CSV files with read_csv() ... Note that the numbers are different even for the same...
Read more >
Pandas Convert Column to Float in DataFrame
When you have some cells with character values on a column you wanted to convert to float, it returns an error. To ignore...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found