question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dask.DataFrame.read_csv: ValueError: Cannot convert NA to integer

See original GitHub issue

I have the following snippet to read a csv with dask.

import dask.dataframe as dd
df = dd.read_csv('/path/to/data.csv', sep = ';', header = 0, encoding = 'utf-8')
df.head(5)

The error I get:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-919f0aa6a2cb> in <module>()
      1 import dask.dataframe as dd
      2 df = dd.read_csv('/path/to/data.csv', sep = ';', header = 0, encoding = 'utf-8')
----> 3 df.head(5)

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/dataframe/core.pyc in head(self, n, compute)
    369 
    370         if compute:
--> 371             result = result.compute()
    372         return result
    373 

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/base.pyc in compute(self, **kwargs)
     35 
     36     def compute(self, **kwargs):
---> 37         return compute(self, **kwargs)[0]
     38 
     39     @classmethod

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/base.pyc in compute(*args, **kwargs)
    108                 for opt, val in groups.items()])
    109     keys = [var._keys() for var in variables]
--> 110     results = get(dsk, keys, **kwargs)
    111 
    112     results_iter = iter(results)

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/threaded.pyc in get(dsk, result, cache, num_workers, **kwargs)
     55     results = get_async(pool.apply_async, len(pool._pool), dsk, result,
     56                         cache=cache, queue=queue, get_id=_thread_get_id,
---> 57                         **kwargs)
     58 
     59     return results

/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/async.pyc in get_async(apply_async, num_workers, dsk, result, cache, queue, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, **kwargs)
    486                 _execute_task(task, data)  # Re-execute locally
    487             else:
--> 488                 raise(remote_exception(res, tb))
    489         state['cache'][key] = res
    490         finish_task(dsk, key, state, results, keyorder.get)

ValueError: Cannot convert NA to integer

Traceback
---------
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/async.py", line 267, in execute_task
    result = _execute_task(task, data)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/async.py", line 249, in _execute_task
    return func(*args2)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/dataframe/csv.py", line 43, in bytes_read_csv
    coerce_dtypes(df, dtypes)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/dataframe/csv.py", line 67, in coerce_dtypes
    df[c] = df[c].astype(dtypes[c])
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 2632, in astype
    dtype=dtype, copy=copy, raise_on_error=raise_on_error, **kwargs)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2864, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2823, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 430, in astype
    values=values, **kwargs)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 472, in _astype
    values = com._astype_nansafe(values.ravel(), dtype, copy=True)
  File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 2460, in _astype_nansafe
    raise ValueError('Cannot convert NA to integer')    

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mglowacki100commented, Dec 5, 2019

I’m facing exactly the same issue, but I use dd.read_sql_table instead of dd.read_csv. I can’t find anything similar to assume_missing=True for this. How to solve it?

1reaction
ghostcommented, Jun 9, 2016
dd.read_csv('/path/to/data.csv', sep=';', header=0, encoding='utf-8', dtype='object')

solves the problem. There is a column with mixed dtype.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Does Dask Dataframes not tolerate for NA Values in it? getting ...
Dask probably infers the wrong datatype: It assumes an integer column ... into the problem that the unexpected NA can't be converted to...
Read more >
How to Fix: ValueError: cannot convert float NaN to integer
This error occurs when you attempt to convert a column in a pandas DataFrame from a float to an integer, yet the column...
Read more >
Converting Dask DataFrame object columns to numbers with ...
This post explains how to convert Dask DataFrame object columns to floating point columns with to_numeric() and why it's more flexible than ...
Read more >
dask.dataframe.read_csv - Dask documentation
This parallelizes the pandas.read_csv() function in the following ways: ... columns inferred as integers contain missing values, and convert them to floats.
Read more >
Data Types and Formats – Data Analysis and Visualization in ...
Define, manipulate, and interconvert integers and floats in Python. ... Notice that this throws a value error: ValueError: Cannot convert NA to integer...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found