Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Categorical data not surviving a repartition

See original GitHub issue

When creating categorical columns with

df[column] = df[column].astype('category')
df = df.repartition(npartitions=n)

seems to result in partitions that have the old datatype for the column and one partition that has the correct category datatype.

This is with dask 15.2, I have not tested older versions.

Issue Analytics

State:
Created 6 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

jcristcommented, Sep 12, 2017

Should be fixed by #2676.

1reaction

jcristcommented, Sep 12, 2017

Yep, PR coming momentarily.

Top Results From Across the Web

Categorical Data — xgboost 1.7.2 documentation

The easiest way to pass categorical data into XGBoost is using dataframe and the ... Optimal partitioning is a technique for partitioning the...

Comparing Groups – Categorical Variables - PMC - NCBI - NIH

Categorical variables are commonly represented as counts or frequencies. For analysis, such data are conveniently arranged in contingency tables.

An Introduction to Categorical Data Analysis - Second Edition

Probability Distributions for Categorical Data, 3 ... responses such as whether a patient survives an operation (yes, no), severity of an.

Ignorability for Categorical Data - jstor

Introduction. In a sequence of papers Rubin [15], Heitjan and Rubin [11] and Heitjan [9, 10] have investigated the question under what conditions...

Categorical Data with tpot - python - Stack Overflow

TPOT assumes that all data will come in a scikit-learn compatible format, which entails that all of the data is numeric. We only...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Categorical data not surviving a repartition

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

"ValueError: Not all divisions are known, can't align partitions" when performing math on dataframe column

Foldby not combining in parallel.