Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dask.DataFrame.read_csv: ValueError: Cannot convert NA to integer

See original GitHub issue

I have the following snippet to read a csv with dask.

import dask.dataframe as dd
df = dd.read_csv('/path/to/data.csv', sep = ';', header = 0, encoding = 'utf-8')
df.head(5)

The error I get:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-919f0aa6a2cb> in <module>()
      1 import dask.dataframe as dd
      2 df = dd.read_csv('/path/to/data.csv', sep = ';', header = 0, encoding = 'utf-8')
----> 3 df.head(5)

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/dataframe/core.pyc in head(self, n, compute)
    369 
    370         if compute:
--> 371             result = result.compute()
    372         return result
    373 

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/base.pyc in compute(self, **kwargs)
     35 
     36     def compute(self, **kwargs):
---> 37         return compute(self, **kwargs)[0]
     38 
     39     @classmethod

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/base.pyc in compute(*args, **kwargs)
    108                 for opt, val in groups.items()])
    109     keys = [var._keys() for var in variables]
--> 110     results = get(dsk, keys, **kwargs)
    111 
    112     results_iter = iter(results)

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/threaded.pyc in get(dsk, result, cache, num_workers, **kwargs)
     55     results = get_async(pool.apply_async, len(pool._pool), dsk, result,
     56                         cache=cache, queue=queue, get_id=_thread_get_id,
---> 57                         **kwargs)
     58 
     59     return results

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/async.pyc in get_async(apply_async, num_workers, dsk, result, cache, queue, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, **kwargs)
    486                 _execute_task(task, data)  # Re-execute locally
    487             else:
--> 488                 raise(remote_exception(res, tb))
    489         state['cache'][key] = res
    490         finish_task(dsk, key, state, results, keyorder.get)

ValueError: Cannot convert NA to integer

Traceback
---------
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/async.py", line 267, in execute_task
    result = _execute_task(task, data)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/async.py", line 249, in _execute_task
    return func(*args2)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/dataframe/csv.py", line 43, in bytes_read_csv
    coerce_dtypes(df, dtypes)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/dataframe/csv.py", line 67, in coerce_dtypes
    df[c] = df[c].astype(dtypes[c])
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 2632, in astype
    dtype=dtype, copy=copy, raise_on_error=raise_on_error, **kwargs)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2864, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2823, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 430, in astype
    values=values, **kwargs)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 472, in _astype
    values = com._astype_nansafe(values.ravel(), dtype, copy=True)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 2460, in _astype_nansafe
    raise ValueError('Cannot convert NA to integer')

Issue Analytics

State:
Created 7 years ago
Comments:12 (5 by maintainers)

Top GitHub Comments

1reaction

mglowacki100commented, Dec 5, 2019

I’m facing exactly the same issue, but I use dd.read_sql_table instead of dd.read_csv. I can’t find anything similar to assume_missing=True for this. How to solve it?

1reaction

ghostcommented, Jun 9, 2016

dd.read_csv('/path/to/data.csv', sep=';', header=0, encoding='utf-8', dtype='object')

solves the problem. There is a column with mixed dtype.

Top Results From Across the Web

Does Dask Dataframes not tolerate for NA Values in it? getting ...

Dask probably infers the wrong datatype: It assumes an integer column ... into the problem that the unexpected NA can't be converted to...

How to Fix: ValueError: cannot convert float NaN to integer

This error occurs when you attempt to convert a column in a pandas DataFrame from a float to an integer, yet the column...

Converting Dask DataFrame object columns to numbers with ...

This post explains how to convert Dask DataFrame object columns to floating point columns with to_numeric() and why it's more flexible than ...

dask.dataframe.read_csv - Dask documentation

This parallelizes the pandas.read_csv() function in the following ways: ... columns inferred as integers contain missing values, and convert them to floats.

Data Types and Formats – Data Analysis and Visualization in ...

Define, manipulate, and interconvert integers and floats in Python. ... Notice that this throws a value error: ValueError: Cannot convert NA to integer...