what to do if ref and test data have different categories in chisquare?
See original GitHub issueI have a question that I think should have been taken into account in this library but I can’t find the solution.
Currently if the reference data has a category feature that is different from that of the test data, we will get an error when we call the predict method in TabularDrift
or ChiSquareDrift
.
I created categories_per_feature
on the whole data but the way I split the data, one of the features of my reference data has categories from 0 to 11, and 0 to 12 for test data.
The error I get is
operands could not be broadcast together with shapes (13,) (12,)
This error comes from chisquare
function under the hood.
I think this is not a rare incident and it is probable that the reference data does not have all the categories of the test data for one or more features.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:14 (7 by maintainers)
@tjhallum and @AsiehH : addressed by #222
I forgot to speak specifically to this in my previous reply. In line with your vision, I am in fact using
TabularDrift
on inputs for machine learning models. It’s just that in my case I’ve specifically setup my models so that they do not break when encountering new categories that weren’t previously seen during training.