Categorical data not surviving a repartition
See original GitHub issueWhen creating categorical columns with
df[column] = df[column].astype('category')
df = df.repartition(npartitions=n)
seems to result in partitions that have the old datatype for the column and one partition that has the correct category datatype.
This is with dask 15.2, I have not tested older versions.
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Categorical Data — xgboost 1.7.2 documentation
The easiest way to pass categorical data into XGBoost is using dataframe and the ... Optimal partitioning is a technique for partitioning the...
Read more >Comparing Groups – Categorical Variables - PMC - NCBI - NIH
Categorical variables are commonly represented as counts or frequencies. For analysis, such data are conveniently arranged in contingency tables.
Read more >An Introduction to Categorical Data Analysis - Second Edition
Probability Distributions for Categorical Data, 3 ... responses such as whether a patient survives an operation (yes, no), severity of an.
Read more >Ignorability for Categorical Data - jstor
Introduction. In a sequence of papers Rubin [15], Heitjan and Rubin [11] and Heitjan [9, 10] have investigated the question under what conditions...
Read more >Categorical Data with tpot - python - Stack Overflow
TPOT assumes that all data will come in a scikit-learn compatible format, which entails that all of the data is numeric. We only...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Should be fixed by #2676.
Yep, PR coming momentarily.