Numerical columns treated as categorical
See original GitHub issueHi guys,
I heard of PPS, through your article and was curious to test it. I have tried implementing it on some data I’ve been working on.
Unfortunately, I get numerous error messages when calculating the pps matrix :
Warning: The least populated class in y has only 1 members, which is too few. The minimum number of members in any class cannot be less than n_splits=4.
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
My guess is pps is considering my data to be categorical and therefore trying to apply classification with a huge number of labels.
Looking at how pps determines if the data is numerical or categorical, I cannot find the reason it would consider my data categorical :
- The dtypes are int or float
- The number of unique values is higher than 15 (except for 1 column which is equal to 15, but changing the NUMERIC_AS_CATEGORIC_BREAKPOINT constant to 10 does not resolve the problem)
Also, if I try to force the pps score to be calculated using task = ‘regression’, I get the following error :
‘DataFrame’ object has no attribute ‘dtype’
Here is my code :
import pandas as pd
import ppscore as pps
df = pd.read_csv('seattle_building_energy_benchmark.csv', sep = ';')
df.dtypes
df.nunique()
pps.NUMERIC_AS_CATEGORIC_BREAKPOINT = 10
for col in df.columns:
print(col)
pps.score(df, x = 'YearBuilt', y = col, task = None)
for col in df.columns:
print(col)
pps.score(df, x = 'YearBuilt', y = col, task = 'regression')
pps.matrix(df)
Is there something I am missing ? If not, would you like me to share the data with you ? (I do not know which sharing method is more convenient for you)
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Yes, the data was very helpful - thank you for that!
Hi Florian,
I’m happy to learn the data helped you identify the problems 😃
I had a hint the categorical breakpoint might not work but couldn’t be sure as the for loop was acting weird. Didn’t anticipate the x = y exception !
Thanks again for providing this package and taking the time to update and support it.
Cheers,
Alexander