question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MLPClassifier supports fitting on multilabel output but cannot be used with partial_fit

See original GitHub issue

Description

Performance is much worse when using partial_fit method on multilabel y than using fit on the same data. I suspect that the issue is partial_fit supports multi-class but not multi-label. Why is this the case when fit supports multi-label?

Steps/Code to Reproduce

X_train.shape, y_train.shape # --> ((3963, 4572), (3963, 39))
# where y is binary [0,1] for each of the 39 columns

mlp = MLPClassifier(hidden_layer_sizes=(500, ), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=500, shuffle=True, random_state=123, tol=0.0001, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10)
mlp.partial_fit(X_train, y_train, classes=list(range(y_train.shape[1])))
y_pred = mlp.predict(X_test)
precision_score(y_test, y_pred, average='weighted')

Expected Results

precision score is 0.635 when using just fit.

Actual Results

precision score is 0.216 when using partial_fit method.

Versions

System

python: 3.6.5 (default, Apr 25 2018, 14:23:58)  [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)]

executable: /Users/clhq/.local/share/virtualenvs/LTR-looks-tag-recommender-QOgMS24J/bin/python machine: Darwin-17.7.0-x86_64-i386-64bit

BLAS

macros: NO_ATLAS_INFO=3, HAVE_CBLAS=None

lib_dirs: cblas_libs: cblas

Python deps

   pip: 18.1

setuptools: 40.4.3 sklearn: 0.20.0 numpy: 1.15.3 scipy: 1.1.0 Cython: None pandas: 0.23.4

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:22 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
christinebucklercommented, Nov 6, 2018

@jnothman Yes the above code reproduces the problem with one exception. y is multi-label in my case, meaning that more than 1 class can be positive at a time. The following would generate similar multi-label data.

from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(n_samples=1000, n_features=100, n_classes=39, n_labels=3, allow_unlabeled=False, random_state=1)
0reactions
jnothmancommented, Nov 15, 2018

warm_start=‘full’ would apply only to fit. For consistency with SGDClassifier (unless I am much mistaken), partial_fit should always run only one iteration

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multi-label out-of-core learning for text data - Stack Overflow
As described here, the idea is to read (large scale) text data sets in batches and partially fitting them to the classifiers. Additionally,...
Read more >
sklearn.neural_network.MLPClassifier
Determines random number generation for weights and bias initialization, train-test split if early stopping is used, and batch sampling when solver='sgd' or ' ......
Read more >
deep-learning - Stack Exchange Data Explorer
'How to store scaling parameters for later use', 'sklearn.mixture. ... 'SKlearn import MLPClassifier fails', 'Cannot get scikit-learn ...
Read more >
Multi-Label Classification with Scikit-MultiLearn - Section.io
In multi-label classification, we have several labels that are the outputs for a given prediction. When making predictions, a given input ...
Read more >
NN - Multi-layer Perceptron Classifier (MLPClassifier)
4 MLPClassifier for Multi-Class Classification ... Between the input and the output layer there may be one or more nonlinear hidden layers.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found