question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CLN: consider deprecating convert_floats from read_excel

See original GitHub issue

As the docs explain, the binary spreadsheet formats (xls, xlsb) store all numbers as floats, so by default pandas tries to convert floats to integers if it doesn’t lose information (1.0 --> 1). “You can pass ``convert_float=False` to disable this behavior, which may give a slight performance improvement.” I tested this on four file types for a spreadsheet of ~440,000 cells, and recorded the best times out of 10 repetitions:

File type convert_floats=True convert_floats=False speed up
xls 1.081 1.036 4.2%
xlsb 3.413 3.357 1.6%
ods 27.798 27.770 0.1%
xlsx 5.182 5.189 -0.1%

convert_floats was probably written for the benefit of .xls files, but the benefit is minor. The .xlsx files even have a slight penalty because openpyxl already converts floats to int where possible, and so pandas converts them back to float if convert_floats=False.

Since .xlsx files are now the most common spreadsheet format (citation: google search), and convert_floats only exists for performance, is it time to remove convert_floats? The spreadsheet engines would keep the behaviour of convert_floats=True and the argument would be deprecated. This change would simplify all four engines, and if anybody really needs their ints as floats, they can always specify a dtype. Note: this possible deprecation came up in https://github.com/pandas-dev/pandas/issues/8212#issuecomment-54804297 before dtype was finalized in read_excel.

I can work on this if the community likes the idea.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ahawrylukcommented, Jul 23, 2021

Hi @italo-turing I think you’ve already found the best alternative if you’re constrained to .xlsx as input. If you’re free to change your input file to .csv then the mixed-type (object) column will load with both ints and floats.

0reactions
italo-turingcommented, Jul 13, 2021

I have a .xlsx file that contains a column of mostly strings but with the occasional number. I want those numbers to be read as shown in Excel (integers as integers, floats as floats). Before this deprecation, read_excel read those numbers as integers when appropriate, but now they are always read as floats. Specifying {col: str} doesn’t help; the numbers still get parsed as floats. So, for example, a cell showing 121 in Excel is read as 121.0.

My current solution is to manually iterate through that column later and figure those cases out. Is there a better alternative here?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Solved: Can't convert Excel to float - Power Platform Community
After making use of the coalesce function, it appears that the Excel data is being read as NULL however, the actual value is...
Read more >
pandas.read_excel — pandas 1.5.2 documentation
Convert integral floats to int (i.e., 1.0 –> 1). If False, all numeric data will be read in as floats: Excel stores all...
Read more >
Integers from excel files become floats? - Stack Overflow
Excel treats all numbers as floats. In general, it doesn't care whether your_number % 1 == 0.0 is true or not. Example: A1...
Read more >
minecraft dojo tutorial
If you are struggling, take a break, or re-read the material. xhr is deprecated in favor of dojo/request/xhr, the documentation for which you...
Read more >
MATLAB xlsread - MathWorks
Read the data from the worksheet, and reset any values outside the range [0.2,0.8] . Specify the sheet name, but use '' as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found