CalibratedClassifierCV doesn't work well with CatBoostClassifier . Gives wrong output.
See original GitHub issueDescription
Problem:
When trying to calibrate the class probability estimates with scikit-learn’s CalibratedClassifierCV
, all I get are 1’s for the negative target and 0’s for the positive target in a binary classification problem. If I use CatBoostClassifier
indipendently I get normal looking probabilities. This leads me to believe that this Classifier is not compatible with the calibration technique. Is there a way I can go about fixing this issue?
Steps/Code to Reproduce
from catboost import CatBoostClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.datasets import make_classification
X, y = make_classification(100, 10)
cat = CatBoostClassifier(verbose=0)
calib = CalibratedClassifierCV(base_estimator=cat, method='sigmoid', cv=2)
cat.fit(X, y)
calib.fit(X, y)
print(cat.predict_proba(X))
print(calib.predict_proba(X))
Expected Results
The result is supposed to be calibrated probabilities but instead I get all 1’s for the first column and all zeros for the second one. This is clearly the wrong output.
Actual Results
See above
Versions
System:
python: 3.7.3 (default, Jul 8 2019, 11:40:34) [GCC 6.5.0 20181026]
executable: ~/.pyenv/versions/3.7.3/envs/absa-py37/bin/python
machine: Linux-5.0.0-27-generic-x86_64-with-debian-buster-sid
Python deps:
pip: 19.0.3
setuptools: 40.8.0
sklearn: 0.21.3
numpy: 1.17.0
scipy: 1.3.1
Cython: None
pandas: 0.25.0
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Bug in CatBoost? CatBoostClassifier doesn't work well with ...
It is an open issue with CatBoost. Please check the open issue here.
Read more >When a classifier predicting probability should be calibrated?
A wrong objective function for predicting probabilities was chosen. That will be the case for distorted probabilities predicted by sklearn ...
Read more >Binary Classification Tutorial (CLF102) - Level Intermediate
It is a type of disturbance in the data that is not handled well by machine learning models (mostly linear algorithms).
Read more >Bug In Catboost Catboostclassifier Does not Work Well With
Thus the model doesn't collate this value with the one in the training dataset therefore the prediction is incorrect.None categorical feature.The feature f....
Read more >Newest 'catboost' Questions - Stack Overflow
i'm working on my thesis and i used a Catboost classifier to perform a binary ... print(arr.flags) Output: (676, 3516) C_CONTIGUOUS : False...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
It think this issue is cause by the
CatBoostClassifier
classifier have a empty classes_ attribute after fit. You can see this by the code below https://github.com/catboost/catboost/issues/984:This is causing
idx_pos_class
to be empty so no classifier is actually being fit.https://github.com/scikit-learn/scikit-learn/blob/d6d1d63fa6b098c72953a6827aae475f611936ed/sklearn/calibration.py#L313-L316
Is there an update about this topic? Does sklearn Isotonic regression support CatBoostClassifier? If not- Is there an easy fix for this?