question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use of sparse matrices

See original GitHub issue

Hi, nice to see someone continue development on a wrapper like this after the people at tensorflow decided to discontinue development on their wrapper.

I have run into an issue with the use of sparse matrices.

In the API documentation it is mentioned that the fit and predict functions from the KerasClassifier wrapper should work with array-like, sparse matrix and dataframe. However, when I use a sparse matrix, I get the following exception:

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

I used the quickstart guide to get a simple reproducable issue. I simply converted the ndarrays in the example into a scipy.sparse coo_matrix:

import numpy as np
from sklearn.datasets import make_classification
from tensorflow import keras
from scipy.sparse import coo_matrix

from scikeras.wrappers import KerasClassifier


X, y = make_classification(1000, 20, n_informative=10, random_state=0)
X = X.astype(np.float32)
y = y.astype(np.int64)

X = coo_matrix(X)
y = coo_matrix(y)

def get_model(hidden_layer_dim, meta):
    # note that meta is a special argument that will be
    # handed a dict containing input metadata
    n_features_in_ = meta["n_features_in_"]
    X_shape_ = meta["X_shape_"]
    n_classes_ = meta["n_classes_"]

    model = keras.models.Sequential()
    model.add(keras.layers.Dense(n_features_in_, input_shape=X_shape_[1:]))
    model.add(keras.layers.Activation("relu"))
    model.add(keras.layers.Dense(hidden_layer_dim))
    model.add(keras.layers.Activation("relu"))
    model.add(keras.layers.Dense(n_classes_))
    model.add(keras.layers.Activation("softmax"))
    return model

clf = KerasClassifier(
    get_model,
    loss="sparse_categorical_crossentropy",
    hidden_layer_dim=100,
)

clf.fit(X, y)
y_proba = clf.predict_proba(X)

A potential reason for the issue could be that when validating the inputs via sklearn.utils.check_X_y, the default parameter for accept_sparse is False. See also here

Setting this parameter to true might solve the issue (I will go and test that soon). I am running this on python=3.7.10, scikit-learn=0.24.2 and tensorflow=2.5.0

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
adriangbcommented, Jul 22, 2022

Awesome that’s about as real world useful as it gets. I think I’ll move forward with #240 tomorrow

1reaction
mattalhonte-srmcommented, Jul 22, 2022

The main way I know is that casting .todense() made my container crash, while passing the Sparse matrix didn’t.

Read more comments on GitHub >

github_iconTop Results From Across the Web

A Gentle Introduction to Sparse Matrices for Machine Learning
Large sparse matrices are common in general and especially in applied machine learning, such as in data that contains counts, data encodings ...
Read more >
Sparse matrix - Wikipedia
The concept of sparsity is useful in combinatorics and application areas such as network theory and numerical analysis, which typically have a low...
Read more >
Computational Advantages of Sparse Matrices - MATLAB ...
Using sparse matrices to store data that contains a large number of zero-valued elements can both save a significant amount of memory and...
Read more >
Sparse Matrix and its representations | Set 1 (Using Arrays ...
Representing a sparse matrix by a 2D array leads to wastage of lots of memory as zeroes in the matrix are of no...
Read more >
Sparse Matrix - LearnDataSci
A sparse matrix is a special case of a matrix in which the number of zero elements is much higher than the number...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found