question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

what to do if ref and test data have different categories in chisquare?

See original GitHub issue

I have a question that I think should have been taken into account in this library but I can’t find the solution.

Currently if the reference data has a category feature that is different from that of the test data, we will get an error when we call the predict method in TabularDrift or ChiSquareDrift. I created categories_per_feature on the whole data but the way I split the data, one of the features of my reference data has categories from 0 to 11, and 0 to 12 for test data. The error I get is operands could not be broadcast together with shapes (13,) (12,) This error comes from chisquare function under the hood.

I think this is not a rare incident and it is probable that the reference data does not have all the categories of the test data for one or more features.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
arnaudvlcommented, Apr 16, 2021

@tjhallum and @AsiehH : addressed by #222

0reactions
tjhallumcommented, Apr 7, 2021

We obviously agree with that but saw it more in the context of drift detection on inputs for machine learning models. A lot of models will simply break if they are presented with categories that were not seen during training. But we also want to facilitate your use case…

I forgot to speak specifically to this in my previous reply. In line with your vision, I am in fact using TabularDrift on inputs for machine learning models. It’s just that in my case I’ve specifically setup my models so that they do not break when encountering new categories that weren’t previously seen during training.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SPSS Tutorials: Chi-Square Test of Independence - LibGuides
The Chi-Square Test of Independence determines whether there is an association between categorical variables (i.e., whether the variables ...
Read more >
categories for chi-square test of independence - Cross Validated
The chi-square test can be used for any contingency table where you are comparing observed counts against the counts expected among ...
Read more >
8. The Chi squared tests
The χ²tests The distribution of a categorical variable in a sample often needs to be compared with the distribution of a categorical variable...
Read more >
Chi-Square (Χ²) Tests | Types, Formula & Examples - Scribbr
A chi-square test (a chi-square goodness of fit test) can test whether these observed frequencies are significantly different from what was ...
Read more >
Chi-Square (χ2) Statistic: What It Is, Examples, How and ...
A chi-square (χ2) statistic is a test that is used to measure how expectations compare to actual observed data or model results.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found