question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Categorical data not surviving a repartition

See original GitHub issue

When creating categorical columns with

df[column] = df[column].astype('category')
df = df.repartition(npartitions=n)

seems to result in partitions that have the old datatype for the column and one partition that has the correct category datatype.

This is with dask 15.2, I have not tested older versions.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
jcristcommented, Sep 12, 2017

Should be fixed by #2676.

1reaction
jcristcommented, Sep 12, 2017

Yep, PR coming momentarily.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Categorical Data — xgboost 1.7.2 documentation
The easiest way to pass categorical data into XGBoost is using dataframe and the ... Optimal partitioning is a technique for partitioning the...
Read more >
Comparing Groups – Categorical Variables - PMC - NCBI - NIH
Categorical variables are commonly represented as counts or frequencies. For analysis, such data are conveniently arranged in contingency tables.
Read more >
An Introduction to Categorical Data Analysis - Second Edition
Probability Distributions for Categorical Data, 3 ... responses such as whether a patient survives an operation (yes, no), severity of an.
Read more >
Ignorability for Categorical Data - jstor
Introduction. In a sequence of papers Rubin [15], Heitjan and Rubin [11] and Heitjan [9, 10] have investigated the question under what conditions...
Read more >
Categorical Data with tpot - python - Stack Overflow
TPOT assumes that all data will come in a scikit-learn compatible format, which entails that all of the data is numeric. We only...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found