question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer on non integer column.

See original GitHub issue

I have dask bag with 59 n_partitions with chucksize of 100 000 ( so basically around 6 million records). I want to transform dask bag to dask dataframe and then to pandas dataframe. This is my snippet.

%%time
bag = dask_mongo.read_mongo(
    database="XXXXX",
    collection="XXXX",
    connection_kwargs={"XXXXXXXX"},
    chunksize=100000,
)
df = bag.to_dataframe()
df = df.astype('object')
df2 = df.compute()

I tried multiple things - cast every column to dtype object, remove na values with dropna(). Everytime i get this exception:

IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

I understand that it needs to get rid of NaN objects, but i tried multiple ways and no success.

Compute to list

list = bag.compute()

works fust fine. But i really want it in dataframe so i can do analysis on those data later on

I even tried to take one column and cast it o float64 (as i read, pandas do not support NaN in integer data types, so a tried float64 and object dtype) :

Dask Series Structure:
npartitions=59
    float64
        ...
     ...   
        ...
        ...
Name: xxx, dtype: float64
Dask Name: astype, 236 tasks

but even there i get IntCastingNaNError

After that, i tried to cast all columns to dtype object =

df = df.astype('object')

with no luck either. Same exception. Basically every column has this exception… Thanks…

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ncclementicommented, Dec 13, 2021

@Christiankoo not sure if this helps now, but another option to have the dictionary of the meta from the dataframe is, on the example above:

>>>  ddf._meta.dtypes.to_dict()
{'a': dtype('int64'), 'b': dtype('int64')}

Then you can replace only what you need on that dictionary.

1reaction
scharlottej13commented, Dec 10, 2021

Glad it worked! Yeah that’s a great question. I don’t think Dask bag currently has any functionality to do this. One workaround might be to get the column names from your Dask DataFrame first to dynamically populate the meta argument, something like:

import numpy as np
import pandas as pd
import dask.dataframe as dd
import dask.bag as db

b = db.from_sequence([{'a': 1,   'b': 1},
                     {'a': np.inf,   'b': np.nan},
                    {'a': 1,   'b': np.nan}],
                    npartitions=2)
ddf = b.to_dataframe()
my_cols = list(ddf.columns)
meta_dict = dict(zip(my_cols, [object]*len(my_cols)))
ddf = b.to_dataframe(meta_dict)
ddf.compute()

But maybe the others on this issue have a more elegant solution 😃. It might be nice to add a feature to Dask bag that’s analogous to returning columns from a Dask DataFrame, or even just being able to see a preview of the first partition.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data type conversion error: ValueError: Cannot convert non ...
I am using the following code to convert "birth year" column type from float to int. df1[['birth year']] = df1[['birth year']].astype(int) print ...
Read more >
Valueerror: cannot convert non-finite values (na or inf) to integer
This mistake can be fixed by using the fillna(0) method of the panda's data frame to replace all of the nan values in...
Read more >
Need help with type conversion | Data Science and Machine ...
How would I convert a float to an int, I tried astype but have me an error "Cannot convert non-finite values (NA or...
Read more >
Cannot convert non-finite values (NA or inf) to integer
The issue is this time I get an exception and cannot create the dataframe. IntCastingNaNError: Cannot convert non-finite values (NA or inf) to ......
Read more >
ValueError: cannot convert float NaN to integer
It is a numeric data type used to represent any value that is undefined or ... generate ValueError: Cannot convert non-finite values (NA...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found