Restructure "Section 1.12. Multiclass and multilabel algorithms" to make distinctions between concepts clearer
See original GitHub issueDescribe the issue linked to the documentation
Note: this is a continuation of discussion in issue #9602. This has also been raised before in issue #1781.
Understanding the subtle distinctions between multi-learning problems can be difficult. “How is multilabel different than multioutput? What format should target y
be in for each problem type?” The answers to these questions are not always clear; see the inconsistent answers provided in this discussion for an example.
Thankfully, scikit-learn’s Glossary does a good job at clearing up ambiguity in its Target Types section. However, it’s not the most visible source of information in search engines because of how it’s indexed.
Instead, Section 1.12. Multiclass and multilabel algorithms is the sklearn page with the most visibility in search engines. It’s also the section linked to in both the sklearn.multiclass
and sklearn.multioutput
modules. Because of its visibility, I feel this section could be doing more with its presentation to clear up ambiguity.
Suggest a potential alternative/fix
- Possibly rename the section to “Multiclass and multioutput algorithms”. (This brings it more in line with sklearn’s current module names. However, this could impact visibility in searches for ‘multilabel’, which seems to be more common than ‘multioutput’.)
- On that note, it could be better to name the page “Multiclass and multioutput/multilabel algorithms”, to at least indicate that the two concepts are directly related. Then, clarify the distinctions in the text itself.
- Add references to
sklearn.multioutput
to the introduction. (Currently, the introduction only referencessklearn.multiclass
.) - Give a brief run-down of the distinctions between terms at the very start.
- Link to the glossary explicitly at the very start. (Currently, the glossary is only referenced in links scattered throughout the paragraphs.)
- Possibly bring the table in the Summary to the very start. It is a succinct breakdown, and easier to parse than the paragraphs.
- Remove subsection “1.12.1. Multilabel classification format”. (It doesn’t fit with the other subsections, which are devoted to different multiclass/multioutput strategies. Target formats/types are better expressed elsewhere.)
- Be clearer about the nuances in
type_of_target
, possibly by linking directly to the utility API reference or Glossary.
I’m interested in opening a PR for these changes, as discussed in the previous issue/PR. I’m happy to hear other suggestions/thoughts about these changes before I go ahead, though. 😃
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
.. currentmodule
is only there to allow for shorthand references to classes/functions in that module, e.g.:class:
OneVsRestClassifier`. It makes little material difference.I’m happy to keep them in one chapter of the user guide together, even if it references two modules, thus allowing the preface to distinguish them.
This was also echoed by @NicolasHug in the WIP PR I made, so I’m happy to keep the 1-page structure. 😃