MLPClassifier supports fitting on multilabel output but cannot be used with partial_fit
See original GitHub issueDescription
Performance is much worse when using partial_fit method on multilabel y than using fit on the same data. I suspect that the issue is partial_fit supports multi-class but not multi-label. Why is this the case when fit supports multi-label?
Steps/Code to Reproduce
X_train.shape, y_train.shape # --> ((3963, 4572), (3963, 39))
# where y is binary [0,1] for each of the 39 columns
mlp = MLPClassifier(hidden_layer_sizes=(500, ), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=500, shuffle=True, random_state=123, tol=0.0001, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10)
mlp.partial_fit(X_train, y_train, classes=list(range(y_train.shape[1])))
y_pred = mlp.predict(X_test)
precision_score(y_test, y_pred, average='weighted')
Expected Results
precision score is 0.635 when using just fit.
Actual Results
precision score is 0.216 when using partial_fit method.
Versions
System
python: 3.6.5 (default, Apr 25 2018, 14:23:58) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)]
executable: /Users/clhq/.local/share/virtualenvs/LTR-looks-tag-recommender-QOgMS24J/bin/python machine: Darwin-17.7.0-x86_64-i386-64bit
BLAS
macros: NO_ATLAS_INFO=3, HAVE_CBLAS=None
lib_dirs: cblas_libs: cblas
Python deps
pip: 18.1
setuptools: 40.4.3 sklearn: 0.20.0 numpy: 1.15.3 scipy: 1.1.0 Cython: None pandas: 0.23.4
Issue Analytics
- State:
- Created 5 years ago
- Comments:22 (15 by maintainers)
Top Results From Across the Web
Multi-label out-of-core learning for text data - Stack Overflow
As described here, the idea is to read (large scale) text data sets in batches and partially fitting them to the classifiers. Additionally,...
Read more >sklearn.neural_network.MLPClassifier
Determines random number generation for weights and bias initialization, train-test split if early stopping is used, and batch sampling when solver='sgd' or ' ......
Read more >deep-learning - Stack Exchange Data Explorer
'How to store scaling parameters for later use', 'sklearn.mixture. ... 'SKlearn import MLPClassifier fails', 'Cannot get scikit-learn ...
Read more >Multi-Label Classification with Scikit-MultiLearn - Section.io
In multi-label classification, we have several labels that are the outputs for a given prediction. When making predictions, a given input ...
Read more >NN - Multi-layer Perceptron Classifier (MLPClassifier)
4 MLPClassifier for Multi-Class Classification ... Between the input and the output layer there may be one or more nonlinear hidden layers.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jnothman Yes the above code reproduces the problem with one exception. y is multi-label in my case, meaning that more than 1 class can be positive at a time. The following would generate similar multi-label data.
warm_start=‘full’ would apply only to fit. For consistency with SGDClassifier (unless I am much mistaken), partial_fit should always run only one iteration