RFC Definition of public API
See original GitHub issueIn several PRs (e.g. https://github.com/scikit-learn/scikit-learn/issues/11182 https://github.com/scikit-learn/scikit-learn/pull/12916) the question arises whether we need to deprecate some object before it can be removed or changed. This goes back to defining what is public API in scikit-learn.
The lest controversial definition is that,
- import paths with that include a module with a leading
_
are private - other modules are public.
However, you could do, for instance,
from sklearn.cluster.dbscan_ import NearestNeighbors
Does it mean that we are supposed to preserve backward compatibility on sklearn.cluster.dbscan_.NearestNeighbors
in terms of import path? How about sklearn.preprocessing.data.sparse
(scipy.sparse
)?
I guess not, meaning that just because we have an import path without an underscore does not mean that it is part of the public API. At the very least it also needs to be documented or used in examples.
If we take this definition,
- most of
sklearn.utils
, is public API (as documented https://scikit-learn.org/stable/modules/classes.html#module-sklearn.utils) but certainly not all 151 objects listed in https://github.com/scikit-learn/scikit-learn/issues/6616#issuecomment-245085954 sklearn.utils.fixes
is not.sklearn.externals
are not (except forsklearn.externals.joblib
) that we previously used in examples.
This would mean that we can e.g. remove sklearn.externals.six
in https://github.com/scikit-learn/scikit-learn/pull/12916 without a deprecation warning (but possibly with a what’s new entry). I have a hard time seeing a user reasonably complaining that we didn’t go through a deprecation cycle there.
This would also help resolving the “public vs private utils” discussion in https://github.com/scikit-learn/scikit-learn/issues/6616
WDYT, do you have other ideas of how we should define what is public API in scikit-learn?
Issue Analytics
- State:
- Created 5 years ago
- Comments:21 (21 by maintainers)
Top GitHub Comments
I agree that the conflicts aren’t as trivial as I thought.
The solution I could come up with so far is that the PR authors must create a commit where they rename
file.py
into_file.py
and create a new emptyfile.py
(or better check it out from master).Then the merge with master is easy.
But the process isn’t necessarily obvious yeah
Or maybe what we need is a tool to help solve such merge conflicts. It’s still the same patch, it just needs to be applied with respect to a different path.