Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

question: Problem in output big float numbers

See original GitHub issue

Problem in output big float numbers

I have some a weird situation trying to get the output in to_excel pandas function with xlsxwriter engine.

I tried to read a xlsx excel file with the number “21631706.9893399”, but when a try to write in a new xlsx excel file gives me the output “21631706.98934”. I tested with openpyxl and xlsxwriter but I got the same results, however if I tried to write a xls excel with xlwt engine it gives me the right answer, is there a way to read and write the same float number with these engines? I suspect there some float limitations in xlsx file, but in the xlsx input the number is fine. By the way, it is not a question about formating number, the ouput number is actually different from the original input.

My code:

import pandas as pd

df = pd.read_excel(r'input\sample.xlsx')

df.to_excel(r'output\excel xlsx - xlsxwriter.xlsx', engine='xlsxwriter', index=False)
df.to_excel(r'output\excel xlsx - openpyxl.xlsx', engine='openpyxl', index=False)
df.to_excel(r'output\excel xls - xlwt.xls', engine='xlwt', index=False)

Github Example

Dependencies:

python = “3.7.4” pandas = “1.3.5” XlsxWriter = “3.0.3” openpyxl = “3.0.9” xlwt = “1.3.0”

Issue Analytics

State:
Created a year ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

jmcnamaracommented, May 2, 2022

The issue that you are seeing is just a symptom of how floating point numbers behave, or more specifically how IEEE754 floating point numbers behave.

Excel and Python (without higher precision libraries) both use IEEE754 “double” floating point numbers which have a general precision of 15 digits. The number in your test case has more than 15 digits (not counting the decimal place) and as a result it gets rounded to a display or storage precision when it is read or written.

For example the number stored in the input file is actually 21631706.989339948:

$ unzip input/sample.xlsx -d input_file
...

$ xmllint --format input_file/xl/worksheets/sheet1.xml | grep -C 2 216
    <row r="2" spans="1:1" x14ac:dyDescent="0.25">
      <c r="A2">
        <v>21631706.989339948</v>
      </c>
    </row>

(On Windows you can verify this by changing the file extension to *.zip, unzipping it, and examining the xml file.)

This number has 17 digits so it cannot be represented without loss of precision in a IEE754 float. Excel reads and displays this as number 21631706.9893399 (as you say above).

Writing this number back out as a text representation is also subject to a loss of precision in the 15+ digits which is what happens:

$ unzip output/excel\ xlsx\ -\ xlsxwriter.xlsx -d xlsxwriter_output
...

$ xmllint --format xlsxwriter_output/xl/worksheets/sheet1.xml | grep -C 2 216
    <row r="2" spans="1:1">
      <c r="A2">
        <v>21631706.98933995</v>
      </c>
    </row>

The xls file behaves differently because it is a binary format and the 64bits that Python has in memory for the IEEE754 double is read/written in same way from/to the file format. This means that it appears more consistent but it doesn’t mean that it is more precise since the underlying representation of the double is the same.

So in summary, this behaviour is a function of handling floating point numbers beyond the range of IEE754 double’s precision.

1reaction

jmcnamaracommented, May 2, 2022

Thanks for the detailed question. I saw your question on StackOverflow and I intended to have a look at it later.

Let me dig into it and get back to you.

Top Results From Across the Web

Problem in comparing Floating point numbers and how to ...

This code results in the correct output, so whenever two floating point numbers are two be compared then rather than using “==” operator,...

15. Floating Point Arithmetic: Issues and Limitations — Python ...

Floating -point numbers are represented in computer hardware as base 2 (binary) fractions. For example, the decimal fraction 0.125 has value 1/10 +...

Is floating point math broken? - Stack Overflow

The main cause of the error in floating point division is the division algorithms used to calculate the quotient. Most computer systems calculate...

What Every Computer Scientist Should Know About Floating ...

Another way to measure the difference between a floating-point number and the real number it is approximating is relative error, which is simply...

parseFloat() - JavaScript - MDN Web Docs

The parseFloat function converts its first argument to a string, parses that string as a decimal number literal, then returns a number or...