get_dummies not executing
See original GitHub issueDask get_dummies
“runs” but it doesn’t actually execute the get_dummies task.
import pandas as pd
import dask.dataframe as dd
pandasData = pd.DataFrame({'var1': ['a', 'b', 'a'], 'var2': ['b', 'a', 'c'], 'var3': ['c', 'a', 'b']})
pd.get_dummies(pandasData)
daskData = dd.from_pandas(pandasData, npartitions=1)
daskData.head()
daskDataDummies = dd.get_dummies(daskData).compute()
daskDataDummies.head()
daskDataDummies.to_csv('daskDataDummies_out.csv', header=True, index=False)
There’s no error message it simply doesn’t transform the dataframe.
Issue Analytics
- State:
- Created 7 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
Using get_dummies(), but it's not working on array
I am trying the encode a column in a dataset using Pandas get_dummies, but it returns 0 as it is not filtering each...
Read more >How to Use Pandas Get Dummies in Python - Sharp Sight
In this tutorial, I'll show you how to use the Pandas get dummies function to create dummy variables in Python.
Read more >pandas.get_dummies — pandas 1.5.2 documentation
Convert categorical variable into dummy/indicator variables. Parameters. dataarray-like, Series, or DataFrame. Data of which to get dummy indicators. prefix ...
Read more >Using get_dummies(), but it's not working on array-Pandas ...
Coding example for the question Using get_dummies(), but it's not working on array-Pandas,Python.
Read more >Pandas Get Dummies – pd.get_dummies() - Data Independent
Be careful, if your categorical column has too many distinct values in it, you'll quickly explode your new dummy columns. Before you run...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Adding to that parametrized test has been helpful so far. Next time I take a look at this I’ll probably dump in a bunch more dtypes and such.
Sampling probably isn’t sufficient. We need to know all of the values throughout the file to determine the columns.