question: Problem in output big float numbers
See original GitHub issueProblem in output big float numbers
I have some a weird situation trying to get the output in to_excel pandas function with xlsxwriter engine.
I tried to read a xlsx excel file with the number “21631706.9893399”, but when a try to write in a new xlsx excel file gives me the output “21631706.98934”. I tested with openpyxl and xlsxwriter but I got the same results, however if I tried to write a xls excel with xlwt engine it gives me the right answer, is there a way to read and write the same float number with these engines? I suspect there some float limitations in xlsx file, but in the xlsx input the number is fine. By the way, it is not a question about formating number, the ouput number is actually different from the original input.
My code:
import pandas as pd
df = pd.read_excel(r'input\sample.xlsx')
df.to_excel(r'output\excel xlsx - xlsxwriter.xlsx', engine='xlsxwriter', index=False)
df.to_excel(r'output\excel xlsx - openpyxl.xlsx', engine='openpyxl', index=False)
df.to_excel(r'output\excel xls - xlwt.xls', engine='xlwt', index=False)
Dependencies:
python = “3.7.4” pandas = “1.3.5” XlsxWriter = “3.0.3” openpyxl = “3.0.9” xlwt = “1.3.0”
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
The issue that you are seeing is just a symptom of how floating point numbers behave, or more specifically how IEEE754 floating point numbers behave.
Excel and Python (without higher precision libraries) both use IEEE754 “double” floating point numbers which have a general precision of 15 digits. The number in your test case has more than 15 digits (not counting the decimal place) and as a result it gets rounded to a display or storage precision when it is read or written.
For example the number stored in the input file is actually 21631706.989339948:
(On Windows you can verify this by changing the file extension to *.zip, unzipping it, and examining the xml file.)
This number has 17 digits so it cannot be represented without loss of precision in a IEE754 float. Excel reads and displays this as number 21631706.9893399 (as you say above).
Writing this number back out as a text representation is also subject to a loss of precision in the 15+ digits which is what happens:
The xls file behaves differently because it is a binary format and the 64bits that Python has in memory for the IEEE754 double is read/written in same way from/to the file format. This means that it appears more consistent but it doesn’t mean that it is more precise since the underlying representation of the double is the same.
So in summary, this behaviour is a function of handling floating point numbers beyond the range of IEE754 double’s precision.
Thanks for the detailed question. I saw your question on StackOverflow and I intended to have a look at it later.
Let me dig into it and get back to you.