question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Restructure "Section 1.12. Multiclass and multilabel algorithms" to make distinctions between concepts clearer

See original GitHub issue

Describe the issue linked to the documentation

Note: this is a continuation of discussion in issue #9602. This has also been raised before in issue #1781.

Understanding the subtle distinctions between multi-learning problems can be difficult. “How is multilabel different than multioutput? What format should target y be in for each problem type?” The answers to these questions are not always clear; see the inconsistent answers provided in this discussion for an example.

Thankfully, scikit-learn’s Glossary does a good job at clearing up ambiguity in its Target Types section. However, it’s not the most visible source of information in search engines because of how it’s indexed.

Instead, Section 1.12. Multiclass and multilabel algorithms is the sklearn page with the most visibility in search engines. It’s also the section linked to in both the sklearn.multiclass and sklearn.multioutput modules. Because of its visibility, I feel this section could be doing more with its presentation to clear up ambiguity.

Suggest a potential alternative/fix

  • Possibly rename the section to “Multiclass and multioutput algorithms”. (This brings it more in line with sklearn’s current module names. However, this could impact visibility in searches for ‘multilabel’, which seems to be more common than ‘multioutput’.)
    • On that note, it could be better to name the page “Multiclass and multioutput/multilabel algorithms”, to at least indicate that the two concepts are directly related. Then, clarify the distinctions in the text itself.
  • Add references to sklearn.multioutput to the introduction. (Currently, the introduction only references sklearn.multiclass.)
  • Give a brief run-down of the distinctions between terms at the very start.
  • Link to the glossary explicitly at the very start. (Currently, the glossary is only referenced in links scattered throughout the paragraphs.)
  • Possibly bring the table in the Summary to the very start. It is a succinct breakdown, and easier to parse than the paragraphs.
  • Remove subsection “1.12.1. Multilabel classification format”. (It doesn’t fit with the other subsections, which are devoted to different multiclass/multioutput strategies. Target formats/types are better expressed elsewhere.)
  • Be clearer about the nuances in type_of_target, possibly by linking directly to the utility API reference or Glossary.

I’m interested in opening a PR for these changes, as discussed in the previous issue/PR. I’m happy to hear other suggestions/thoughts about these changes before I go ahead, though. 😃

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, Jun 27, 2020

.. currentmodule is only there to allow for shorthand references to classes/functions in that module, e.g. :class:OneVsRestClassifier`. It makes little material difference.

I’m happy to keep them in one chapter of the user guide together, even if it references two modules, thus allowing the preface to distinguish them.

0reactions
joshuacwnewtoncommented, Jun 29, 2020

I’m happy to keep them in one chapter of the user guide together, even if it references two modules, thus allowing the preface to distinguish them.

This was also echoed by @NicolasHug in the WIP PR I made, so I’m happy to keep the 1-page structure. 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

1.12. Multiclass and multioutput algorithms - Scikit-learn
This section of the user guide covers functionality related to multi-learning problems, including multiclass, multilabel, and multioutput classification and ...
Read more >
1.12. Multiclass and multilabel algorithms — scikit-learn 文档
Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges,...
Read more >
What is the difference between Multiclass and Multilabel ...
Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, ......
Read more >
An Introduction to Multi-Label Text Classification - Medium
The difference between binary and multi-class classification is that multi-class classification has more than two class labels. A multi-label classification ...
Read more >
How to implement a multi-label and multiclass classification
No they are not same, rather two inter-related concepts but have major difference. · In multi-label learning a data instance may be associated...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found