dask.DataFrame.read_csv: ValueError: Cannot convert NA to integer
See original GitHub issueI have the following snippet to read a csv with dask.
import dask.dataframe as dd
df = dd.read_csv('/path/to/data.csv', sep = ';', header = 0, encoding = 'utf-8')
df.head(5)
The error I get:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-919f0aa6a2cb> in <module>()
1 import dask.dataframe as dd
2 df = dd.read_csv('/path/to/data.csv', sep = ';', header = 0, encoding = 'utf-8')
----> 3 df.head(5)
/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/dataframe/core.pyc in head(self, n, compute)
369
370 if compute:
--> 371 result = result.compute()
372 return result
373
/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/base.pyc in compute(self, **kwargs)
35
36 def compute(self, **kwargs):
---> 37 return compute(self, **kwargs)[0]
38
39 @classmethod
/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/base.pyc in compute(*args, **kwargs)
108 for opt, val in groups.items()])
109 keys = [var._keys() for var in variables]
--> 110 results = get(dsk, keys, **kwargs)
111
112 results_iter = iter(results)
/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/threaded.pyc in get(dsk, result, cache, num_workers, **kwargs)
55 results = get_async(pool.apply_async, len(pool._pool), dsk, result,
56 cache=cache, queue=queue, get_id=_thread_get_id,
---> 57 **kwargs)
58
59 return results
/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/async.pyc in get_async(apply_async, num_workers, dsk, result, cache, queue, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, **kwargs)
486 _execute_task(task, data) # Re-execute locally
487 else:
--> 488 raise(remote_exception(res, tb))
489 state['cache'][key] = res
490 finish_task(dsk, key, state, results, keyorder.get)
ValueError: Cannot convert NA to integer
Traceback
---------
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/async.py", line 267, in execute_task
result = _execute_task(task, data)
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/async.py", line 249, in _execute_task
return func(*args2)
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/dataframe/csv.py", line 43, in bytes_read_csv
coerce_dtypes(df, dtypes)
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/dask/dataframe/csv.py", line 67, in coerce_dtypes
df[c] = df[c].astype(dtypes[c])
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 2632, in astype
dtype=dtype, copy=copy, raise_on_error=raise_on_error, **kwargs)
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2864, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 2823, in apply
applied = getattr(b, f)(**kwargs)
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 430, in astype
values=values, **kwargs)
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 472, in _astype
values = com._astype_nansafe(values.ravel(), dtype, copy=True)
File "/home/vkcem4m/software/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 2460, in _astype_nansafe
raise ValueError('Cannot convert NA to integer')
Issue Analytics
- State:
- Created 7 years ago
- Comments:12 (5 by maintainers)
Top Results From Across the Web
Does Dask Dataframes not tolerate for NA Values in it? getting ...
Dask probably infers the wrong datatype: It assumes an integer column ... into the problem that the unexpected NA can't be converted to...
Read more >How to Fix: ValueError: cannot convert float NaN to integer
This error occurs when you attempt to convert a column in a pandas DataFrame from a float to an integer, yet the column...
Read more >Converting Dask DataFrame object columns to numbers with ...
This post explains how to convert Dask DataFrame object columns to floating point columns with to_numeric() and why it's more flexible than ...
Read more >dask.dataframe.read_csv - Dask documentation
This parallelizes the pandas.read_csv() function in the following ways: ... columns inferred as integers contain missing values, and convert them to floats.
Read more >Data Types and Formats – Data Analysis and Visualization in ...
Define, manipulate, and interconvert integers and floats in Python. ... Notice that this throws a value error: ValueError: Cannot convert NA to integer...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m facing exactly the same issue, but I use
dd.read_sql_table
instead ofdd.read_csv
. I can’t find anything similar toassume_missing=True
for this. How to solve it?solves the problem. There is a column with mixed dtype.