Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Empty cells make Padas use float, even if read_csv(dtype={'FOO': str}) is used

See original GitHub issue

Code Sample, a copy-pastable example if possible

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import pandas as pd

csv_path = 'test.csv'
df = pd.read_csv(csv_path, delimiter=';', quotechar='"',
                 decimal=',', encoding="ISO-8859-1", dtype={'FOO': str})
df.FOO = df.FOO.map(lambda n: n.zfill(6))
print(df)

test.csv:

FOO;BAR
01,23;4,56
1,23;45,6
;987

Problem description

When I use dtype={'FOO': str}, I expect pandas to treat the column as a string. This seems to work, but when an empty cell is present Pandas seems to switch to float.

Expected Output

      FOO     BAR
0  001,23    4.56
1  001,23   45.60
2  000000  987.00

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.10.0-35-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.3 pytest: 3.2.2 pip: 9.0.1 setuptools: 20.7.0 Cython: None numpy: 1.13.3 scipy: 0.19.0 xarray: None IPython: 6.2.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0b10 sqlalchemy: 1.1.14 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Reactions:3
Comments:10 (6 by maintainers)

Top GitHub Comments

3reactions

jorisvandenbosschecommented, Oct 9, 2017

I don’t directly find another related issue, apart from https://github.com/pandas-dev/pandas/issues/1450, which you can actually do as well: add na_values=[], keep_default_na=False to read_csv if you want to prevent the parsing of empty strings to NaNs.

2reactions

jorisvandenbosschecommented, Oct 9, 2017

@MartinThoma If you look at the values of the column, you will see pandas correctly preserved the data as strings (as you specified with dtype={'FOO': str}):

In [20]: df.FOO.values
Out[20]: array(['01,23', '1,23', nan], dtype=object)

The only ‘gotcha’ is that empty strings are still seen as missing values (and thus converted to NaN), and not kept as an empty string.

So your solution of filling the missing values with empty string (df.FOO.fillna(value="")) is actually fine.

Top Results From Across the Web

Get pandas.read_csv to read empty values as empty string ...

I tried passing in str in the converters argument to read_csv (with converters={'One': str}) ), but it still reads the empty cells as...

pandas.read_csv — pandas 1.5.2 documentation

Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied...

Working with missing data — pandas 1.5.2 documentation

Because NaN is a float, a column of integers with even one missing values is cast to floating-point dtype (see Support for integer...

pandas.read_csv — pandas 2.0.0.dev0+922.gbf5ee72d5b ...

Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied...

IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation

If a sequence of int / str is given, a MultiIndex is used. ... from data with element order preserved use pd.read_csv(data, usecols=['foo',...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Empty cells make Padas use float, even if read_csv(dtype={'FOO': str}) is used

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Please add force_suffixes to pandas.merge()

`Series.resample().nlargest` produces incorrect output

Empty cells make Padas use float, even if read_csv(dtype={'FOO': str}) is used

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Please add force_suffixes to pandas.merge()

`Series.resample().nlargest` produces incorrect output

Output of `pd.show_versions()`