question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: read_excel respect Excel text type for numbers

See original GitHub issue

Problem description

I am trying to read an excel file that has a column (called “raster”) of numbers with a leading apostrophe (so that they may be interpreted as text by Excel) since this is one common way to maintain leading zeros for numbers. The numbers need to be always 6 digits long. Additionally some of the values in this column are missing.

The file I am using for this example can be found here.

Code Sample, a copy-pastable example if possible

df = pd.read_excel("test.xlsx", 
                   names=["raster", "benennung"],
                   sheet_name="Tabelle1",
                  )
print(df)
print(df.dtypes)

This returns:

 raster benennung
0  20099.0      Test
1  20099.0    Test 2
2      NaN    Test 3

raster       float64
benennung     object
dtype: object

When I read it without any explicit datatype declaration, the column is read with object type float64 as can be seen above and as a result leading zeros disappear. Next, when I use the fillna function to replace the NaN values and use a string, the column becomes object datatype to take this into account (as far as I understood).

df.raster = df.raster.fillna("999999")
print(df.raster)

This returns:

0     20099
1     20099
2    999999
Name: raster, dtype: object

Assuming that the column is now of type object (i.e. string), I go on to do the padding to make them back to 6 digits:

print(df.raster.str.pad(6, side="left", fillchar="0"))

This returns:

0       NaN
1       NaN
2    999999
Name: raster, dtype: object

This is the unexpected result for me.

I have intentionally not made the changes permanent (hence the print in the same line as pad).

This makes me realize that the numbers had really not been converted to strings when I replaced the NaNs with “999999” since when I try this:

print(df.raster.astype(str))

This returns another representation of the column when explicitly converted to string (and I have tested this works reliably as string later on too i.e. with padding etc.) :

0    20099.0
1    20099.0
2     999999
Name: raster, dtype: object

Bottomline: I know I could have avoided this trouble by explicitly defining datatypes at the start but since I forgot to do that and then ran into this strange behavior, I thought it is worth mentioning here. Whatever makes pandas better makes me happy since I personally like working with pandas a lottt.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:2
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jrebackcommented, Apr 6, 2021

@rwspielman pandas is completely a volunteer project - you are welcome to contribute

1reaction
shabiecommented, Apr 26, 2018

@chris-b1 I think that’d be a very helpful and reliably sensible addition when reading excel files.

I’d love to contribute (if I can manage to do that)…

Read more comments on GitHub >

github_iconTop Results From Across the Web

TEXT function - Microsoft Support
The TEXT function lets you change the way a number appears by applying formatting to it with format codes. It's useful in situations...
Read more >
Working with Cell or Range Formatting - Syncfusion
Excel recognizes the numbers in various formats like: Number; Currency; Percentage; DateTime; Accounting; Scientific; Fraction and; Text. This number format can ...
Read more >
Working with excel files using Pandas - GeeksforGeeks
Now we can import the excel file using the read_excel function in ... also take arguments as numbers for the number of columns...
Read more >
MATLAB xlsread - MathWorks
Excel serial date numbers use a different reference date than MATLAB date numbers. Data Types: char | string. processFcn — ...
Read more >
Converting CSV to Excel: solutions for common issues - Ablebits
In situation when your csv file contains various data types such as text, numbers, currencies, dates and times, you can explicitly indicate ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found