Feature Request: Include p-values attribute for logistic regression
See original GitHub issueScikitlearn is the de facto home for all kinds of modeling algorithms. It has a plethora of
algorithms but still one thing that seems to be missing is the implementation of LogisticRegression where we can have p-values.
It would be great if we have something like model.p_values_ attribute for the Logistic Regression Models.
I know that there is another statistical library statsmodels which provides p_values, but a lot of programmers use sklearn and they build models based on this library. It is somewhat
inconvenient to use statsmodels just to get p-values and run other models such as Random Forest in sklearn.
Afterall, the API of statsmodels and sklearn are quite different. sklean is trend setter and
most people feel comfortable with sklearn API, however, statsmodels follows R-programming API and they are quite different.
In conclusion, It would be great if sklearn provides p-values for linear models.
I am eagerly waiting for the implementation in future versions of sklearn.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:16 (13 by maintainers)

Top Related StackOverflow Question
Considering how this request gets repeated pretty frequently, could someone chime in with some clearer justification that could be added to the documentation? Or this feature could find a home in scikit-learn contrib and then could be linked to? I’m unclear on the reasoning for not including p-values.
Reasoning in #6773:
First, scikit-learn exposes statistical tests via the feature_selection module, and they’re very useful. It’s not like scikit-learn does no stats.
Second, I would argue that p-values are part of model interpretation, and scikit-learn has an inspection module that has model interpretation capabilities. Scikit-learn supports model intepretation for tree-based models with feature_importances_. Granted, linear models expose the coefficients_. But why not add more interpretation capabilities if that’s what people want?
This comment implies that p-values should not be added because there’s uncertainty on how when p-values are reliable. The same comment could be made about other parts of the library, such as trusting the feature importances of tree-based models in a multicollinear dataset. From #16860:
I think this philosphy applies to p-values. They are generally accepted as a useful statistical technique when used in the proper way. It’s up to the user to use them appropriately. But people definitely want this feature (myself included 😃 ).
Assumptions differ whether they are about control on the coefficients or on the prediction. The control on the prediction is much more lax then the control on the coefficients.
No. No need to duplicate functionality across packages.