question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Empty cells make Padas use float, even if read_csv(dtype={'FOO': str}) is used

See original GitHub issue

Code Sample, a copy-pastable example if possible

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import pandas as pd

csv_path = 'test.csv'
df = pd.read_csv(csv_path, delimiter=';', quotechar='"',
                 decimal=',', encoding="ISO-8859-1", dtype={'FOO': str})
df.FOO = df.FOO.map(lambda n: n.zfill(6))
print(df)

test.csv:

FOO;BAR
01,23;4,56
1,23;45,6
;987

Problem description

When I use dtype={'FOO': str}, I expect pandas to treat the column as a string. This seems to work, but when an empty cell is present Pandas seems to switch to float.

Expected Output

      FOO     BAR
0  001,23    4.56
1  001,23   45.60
2  000000  987.00

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.10.0-35-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.3 pytest: 3.2.2 pip: 9.0.1 setuptools: 20.7.0 Cython: None numpy: 1.13.3 scipy: 0.19.0 xarray: None IPython: 6.2.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0b10 sqlalchemy: 1.1.14 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:3
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
jorisvandenbosschecommented, Oct 9, 2017

I don’t directly find another related issue, apart from https://github.com/pandas-dev/pandas/issues/1450, which you can actually do as well: add na_values=[], keep_default_na=False to read_csv if you want to prevent the parsing of empty strings to NaNs.

2reactions
jorisvandenbosschecommented, Oct 9, 2017

@MartinThoma If you look at the values of the column, you will see pandas correctly preserved the data as strings (as you specified with dtype={'FOO': str}):

In [20]: df.FOO.values
Out[20]: array(['01,23', '1,23', nan], dtype=object)

The only ‘gotcha’ is that empty strings are still seen as missing values (and thus converted to NaN), and not kept as an empty string.

So your solution of filling the missing values with empty string (df.FOO.fillna(value="")) is actually fine.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Get pandas.read_csv to read empty values as empty string ...
I tried passing in str in the converters argument to read_csv (with converters={'One': str}) ), but it still reads the empty cells as...
Read more >
pandas.read_csv — pandas 1.5.2 documentation
Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied...
Read more >
Working with missing data — pandas 1.5.2 documentation
Because NaN is a float, a column of integers with even one missing values is cast to floating-point dtype (see Support for integer...
Read more >
pandas.read_csv — pandas 2.0.0.dev0+922.gbf5ee72d5b ...
Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied...
Read more >
IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
If a sequence of int / str is given, a MultiIndex is used. ... from data with element order preserved use pd.read_csv(data, usecols=['foo',...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found