Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add sklearn.metrics.cumulative_gain_curve and sklearn.metrics.lift_curve

See original GitHub issue

Description

I recently added plot_cumulative_gain and plot_lift_curve methods to https://github.com/reiinakano/scikit-plot. To do this, I built an adhoc version of cumulative_gain_curve closely following the sklearn.metrics.roc_curve interface at https://github.com/reiinakano/scikit-plot/blob/master/scikitplot/helpers.py#L157. Let me know if sklearn.metrics.cumulative_gain_curve is something you’d be interested in adding into scikit-learn. I could add example docs for plotting gain and lift curves as well.

Reference I followed for lift and gain: https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/tutorials/mlp_bankloan_outputtype_02.html

plot_cumulative_gain

plot_lift_curve

Issue Analytics

State:
Created 6 years ago
Reactions:5
Comments:7 (6 by maintainers)

Top GitHub Comments

2reactions

GuillemGSubiescommented, Jul 18, 2019

Any progress here? An intuitive explanation of the lift curve can be found here:

http://www2.cs.uregina.ca/~dbd/cs831/notes/lift_chart/lift_chart.html

It is like “how much better than the random model I am doing at each percentile”

1reaction

lorentzenchrcommented, Oct 12, 2022

TLDR

+1 for inclusion of the gain curve/CAP. Naming should reflect different strands of literature: cumulative accuracy profile (CAP) [2][4], concentration curve [3], cumulative lift curve [5]. It should work for binary classification as well as regression (models for the expectation).

Some more background

The cumulative gains curve is the same as the Cumulative Accuracy Profile (CAP), see [1] and [4]. From [2]

Moody’s uses Cumulative Accuracy Profiles (CAP), to make visual, qualitative assessments of model performance. While similar tools exist under a variety of different names (lift-curves, dubbed-curves, receiver-operator curves, power curves, etc.).

References: [1] Tasche 2006 “Validation of internal rating systems and PD estimates” https://arxiv.org/pdf/physics/0606071.pdf [2] Soběhart, J.R., Keenan, S.C., & Stein, R.M. (2000). “Benchmarking Quantitative Default Risk Models: A Validation Methodology” [3] Denuit, M., Trufin, J. (2021). “Lorenz curve, Gini coefficient, and Tweedie dominance for autocalibrated predictors” https://dial.uclouvain.be/pr/boreal/object/boreal%3A254535/datastream/PDF_01/view [4] https://www.listendata.com/2019/09/gini-cumulative-accuracy-profile-auc.html [5] Ling C, Li C (1998). “Data Mining for Direct Marketing: Problems and solutions.” In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 73–79.

Top Results From Across the Web

Metrics Module (API Reference) — Scikit-plot documentation

The scikitplot.metrics module includes plots for machine learning ... it must be one of the options allowed by sklearn.metrics.pairwise.pairwise_distances.

3.3. Metrics and scoring: quantifying the quality of predictions

This is discussed in the section The scoring parameter: defining model evaluation rules. Metric functions: The sklearn.metrics module implements functions ...

sklearn.metrics.auc — scikit-learn 1.2.0 documentation

Examples using sklearn.metrics.auc: Species distribution modeling Species distribution modeling Poisson regression and non-normal loss Poisson regression ...

sklearn.metrics.dcg_score — scikit-learn 1.2.0 documentation

This ranking metric yields a high value if true labels are ranked high by y_score . Usually the Normalized Discounted Cumulative Gain (NDCG, ......

sklearn.metrics.DistanceMetric

This class provides a uniform interface to fast distance metric functions. The various metrics can ... from sklearn.metrics import DistanceMetric >>> dist ...