Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MLPClassifier supports fitting on multilabel output but cannot be used with partial_fit

See original GitHub issue

Description

Performance is much worse when using partial_fit method on multilabel y than using fit on the same data. I suspect that the issue is partial_fit supports multi-class but not multi-label. Why is this the case when fit supports multi-label?

Steps/Code to Reproduce

X_train.shape, y_train.shape # --> ((3963, 4572), (3963, 39))
# where y is binary [0,1] for each of the 39 columns

mlp = MLPClassifier(hidden_layer_sizes=(500, ), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=500, shuffle=True, random_state=123, tol=0.0001, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10)
mlp.partial_fit(X_train, y_train, classes=list(range(y_train.shape[1])))
y_pred = mlp.predict(X_test)
precision_score(y_test, y_pred, average='weighted')

Expected Results

precision score is 0.635 when using just fit.

Actual Results

precision score is 0.216 when using partial_fit method.

Versions

System

python: 3.6.5 (default, Apr 25 2018, 14:23:58)  [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)]

executable: /Users/clhq/.local/share/virtualenvs/LTR-looks-tag-recommender-QOgMS24J/bin/python machine: Darwin-17.7.0-x86_64-i386-64bit

BLAS

macros: NO_ATLAS_INFO=3, HAVE_CBLAS=None

lib_dirs: cblas_libs: cblas

Python deps

   pip: 18.1

setuptools: 40.4.3 sklearn: 0.20.0 numpy: 1.15.3 scipy: 1.1.0 Cython: None pandas: 0.23.4

Issue Analytics

State:
Created 5 years ago
Comments:22 (15 by maintainers)

Top GitHub Comments

1reaction

christinebucklercommented, Nov 6, 2018

@jnothman Yes the above code reproduces the problem with one exception. y is multi-label in my case, meaning that more than 1 class can be positive at a time. The following would generate similar multi-label data.

from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(n_samples=1000, n_features=100, n_classes=39, n_labels=3, allow_unlabeled=False, random_state=1)

0reactions

jnothmancommented, Nov 15, 2018

warm_start=‘full’ would apply only to fit. For consistency with SGDClassifier (unless I am much mistaken), partial_fit should always run only one iteration

Top Results From Across the Web

Multi-label out-of-core learning for text data - Stack Overflow

As described here, the idea is to read (large scale) text data sets in batches and partially fitting them to the classifiers. Additionally,...

sklearn.neural_network.MLPClassifier

Determines random number generation for weights and bias initialization, train-test split if early stopping is used, and batch sampling when solver='sgd' or ' ......

deep-learning - Stack Exchange Data Explorer

'How to store scaling parameters for later use', 'sklearn.mixture. ... 'SKlearn import MLPClassifier fails', 'Cannot get scikit-learn ...

Multi-Label Classification with Scikit-MultiLearn - Section.io

In multi-label classification, we have several labels that are the outputs for a given prediction. When making predictions, a given input ...

NN - Multi-layer Perceptron Classifier (MLPClassifier)

4 MLPClassifier for Multi-Class Classification ... Between the input and the output layer there may be one or more nonlinear hidden layers.