Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

read_csv fails with `TypeError: object cannot be converted to an IntegerDtype` yet succeeds when reading chunks

See original GitHub issue

Code Sample, a copy-pastable example if possible

Download this file upload.txt

# Your code here
import pandas as pd
from enum import Enum, IntEnum, auto
import argparse

# I attached the file in the github issue
filename = "upload.txt"
# this field is coded on 64 bits so 'UInt64' looks perfect.
column = "tcp.options.mptcp.sendkey"

with open(filename) as fd:

    print("READ CHUNK BY CHUNK")

    res = pd.read_csv(
            fd,
            comment='#',
            sep='|',
            dtype={column: 'UInt64' },
            usecols=[column],
            chunksize=1
    )
    for chunk in (res):
        # print("chunk %d" % i)
        print(chunk)



    fd.seek(0) # rewind

    print("READ THE WHOLE FILE AT ONCE ")
    res = pd.read_csv(
            fd,
            comment='#',
            sep='|',
            usecols=[column],
            dtype={"tcp.options.mptcp.sendkey": 'UInt64' }
    )
    print(res)

If I read in chunks, read_csv succeeds, if I try to read the column at once, I get

Traceback (most recent call last):
  File "test2.py", line 34, in <module>
    dtype={"tcp.options.mptcp.sendkey": 'UInt64' }
  File "/nix/store/mhiszrb8cpicjkzgraq796asj2sxpjch-python3.7-pandas-0.24.1/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/nix/store/mhiszrb8cpicjkzgraq796asj2sxpjch-python3.7-pandas-0.24.1/lib/python3.7/site-packages/pandas/io/parsers.py", line 435, in _read
    data = parser.read(nrows)
  File "/nix/store/mhiszrb8cpicjkzgraq796asj2sxpjch-python3.7-pandas-0.24.1/lib/python3.7/site-packages/pandas/io/parsers.py", line 1139, in read
    ret = self._engine.read(nrows)
  File "/nix/store/mhiszrb8cpicjkzgraq796asj2sxpjch-python3.7-pandas-0.24.1/lib/python3.7/site-packages/pandas/io/parsers.py", line 1995, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 900, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 915, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 992, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1124, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1155, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas/_libs/parsers.pyx", line 1235, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "/nix/store/mhiszrb8cpicjkzgraq796asj2sxpjch-python3.7-pandas-0.24.1/lib/python3.7/site-packages/pandas/core/arrays/integer.py", line 308, in _from_sequence_of_strings
    return cls._from_sequence(scalars, dtype, copy)
  File "/nix/store/mhiszrb8cpicjkzgraq796asj2sxpjch-python3.7-pandas-0.24.1/lib/python3.7/site-packages/pandas/core/arrays/integer.py", line 303, in _from_sequence
    return integer_array(scalars, dtype=dtype, copy=copy)
  File "/nix/store/mhiszrb8cpicjkzgraq796asj2sxpjch-python3.7-pandas-0.24.1/lib/python3.7/site-packages/pandas/core/arrays/integer.py", line 111, in integer_array
    values, mask = coerce_to_array(values, dtype=dtype, copy=copy)
  File "/nix/store/mhiszrb8cpicjkzgraq796asj2sxpjch-python3.7-pandas-0.24.1/lib/python3.7/site-packages/pandas/core/arrays/integer.py", line 188, in coerce_to_array
    values.dtype))
TypeError: object cannot be converted to an IntegerDtype

Expected Output

I would like the call to read_csv to succeed without having to read in chunks (which seems to have other side effects as well).

Output of `pd.show_versions()`

I am using v0.23.4 with a patch from master to fix some other bug. [paste the output of ``pd.show_versions()`` here below this line] commit: None python: 3.7.2.final.0 python-bits: 64 OS: Linux OS-release: 4.19.0 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8

pandas: 0+unknown pytest: None pip: 18.1 setuptools: 40.6.3 Cython: None numpy: 1.16.0 scipy: 1.2.0 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.9 feather: None matplotlib: 3.0.2 openpyxl: 2.5.12 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: None lxml.etree: 4.2.6 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.14 pymysql: None psycopg2: None jinja2: None s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None

Issue Analytics

State:
Created 5 years ago
Comments:20 (10 by maintainers)

Top GitHub Comments

2reactions

NumesSanguiscommented, Jun 25, 2020

Sorry that I have no time to properly debug this, but I hope I can contribute a little bit of knowledge.

I’m running into the same problem as OP when I read 1 of the sheets of a .xlsl file (pandas 0.24.2). There are NaN values, but from pandas 0.24 that should work when doing .astype(pd.Int16Dtype()) right?

This gave the same problem as OP:

df_sheet.age = df_sheet.age.astype(pd.Int16Dtype())

However, ugly, but this seemed to have worked for me:

df_sheet.age = df_sheet.age.astype('float')  # first convert to float before int
df_sheet.age = df_sheet.age.astype(pd.Int16Dtype())

1reaction

jrebackcommented, Oct 9, 2021

@alexreg you or anyone is welcome to submit a PR to patch and the core team can review

Top Results From Across the Web

TypeError: object cannot be converted to an IntegerDtype ...

It's known bug, as explained here. Workaround is to convert column first to float and than to Int32 . Make sure you strip...

IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation

The pandas I/O API is a set of top level reader functions accessed like pandas.read_csv() that generally return a pandas object.

UnicodeDecodeError: 'utf-8' codec can't decode byte [...] in ...

Solving the UnicodeDecodeError when using Pandas' read_csv can be done in multiple ways. In this blog post, I list three.

polars.read_csv — Polars documentation - GitHub Pages

By file-like object, we refer to objects with a read() method, such as a file handler ... If this does not succeed, the...

object cannot be converted to an IntegerDtype-Pandas,Python

Coding example for the question TypeError: object cannot be converted to an IntegerDtype-Pandas,Python.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

read_csv fails with `TypeError: object cannot be converted to an IntegerDtype` yet succeeds when reading chunks

Code Sample, a copy-pastable example if possible

Expected Output

Output of `pd.show_versions()`

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

DataFrame.to_json silently ignores index parameter for most orients.

"SpecificationError: nested dictionary is ambiguous in aggregation" in a certain case of groupby-aggregation

read_csv fails with `TypeError: object cannot be converted to an IntegerDtype` yet succeeds when reading chunks

Code Sample, a copy-pastable example if possible

Expected Output

Output of pd.show_versions()

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

DataFrame.to_json silently ignores index parameter for most orients.

"SpecificationError: nested dictionary is ambiguous in aggregation" in a certain case of groupby-aggregation

Output of `pd.show_versions()`