Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FEATURE] Bayesian Kernel Density Classifier

See original GitHub issue

I’ve been using this Bayesian kernel density classifier for a few years and I thought I should move it out from my poorly organized project to this one here.

The prior is $P(y=0)$. I primarily use it for spatial problems.

It is similar to the GMM Classifier with only 2 caveats I can think of.

Hyperparameters are easier to decide on.
Scaling is worse as I believe due to the KDE part scaling linearly with the sample size.

# noinspection PyPep8Naming
class BayesianKernelDensityClassifier(BaseEstimator, ClassifierMixin):
    """
    Bayesian Classifier that uses Kernel Density Estimations to generate the joint distribution
    Parameters:
        - bandwidth: float
        - kernel: for scikit learn KernelDensity
    """
    def __init__(self, bandwidth=0.2, kernel='gaussian'):
        self.classes_, self.models_, self.priors_logp_ = [None] * 3
        self.bandwidth = bandwidth
        self.kernel = kernel

    def fit(self, X, y):
        self.classes_ = np.sort(np.unique(y))
        training_sets = [X[y == yi] for yi in self.classes_]
        self.models_ = [KernelDensity(bandwidth=self.bandwidth, kernel=self.kernel).fit(x_subset)
                        for x_subset in training_sets]

        self.priors_logp_ = [np.log(x_subset.shape[0] / X.shape[0]) for x_subset in training_sets]
        return self

    def predict_proba(self, X):
        logp = np.array([model.score_samples(X) for model in self.models_]).T
        result = np.exp(logp + self.priors_logp_)
        return result / result.sum(1, keepdims=True)

    def predict(self, X):
        return self.classes_[np.argmax(self.predict_proba(X), 1)]

Issue Analytics

State:
Created 4 years ago
Comments:26 (24 by maintainers)

Top GitHub Comments

1reaction

arose13commented, Feb 2, 2020

I don’t know precisely for sklearn either but I figured since the expectation step has to perform a dot product as the most expensive step to compute the expectation that would be p^3 (and apparently ~p^2.3). It should also scale linearly with the number of clusters.

(Not an academic level citation I know but https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra)

1reaction

arose13commented, Feb 2, 2020

Whoops sorry.

https://colab.research.google.com/drive/12z28LCt2Y76smB01w2QizK3_cCo-6y2G

I never noticed how bad scaling is on larger than of the academic datasets and that actually has been a big thing. I’ve used it on 2D species distributional data (I’m trying to see if I am allowed to use that dataset for the notebook).

Top Results From Across the Web

Naive Bayes Classifier using Kernel Density Estimation (with ...

A KDE weights a defined density around each observation xr equally first. In this regard, a kernel function K is needed – e.g....

Bayesian classifiers based on kernel density estimation

This paradigm is a Bayesian network which estimates the true density of the continuous variables using kernels. Besides, tree-augmented naive Bayes, k- ...

Naive Bayes (Kernel) - RapidMiner Documentation

Naive Bayes (Kernel) (RapidMiner Studio Core). Synopsis. This operator generates a Kernel Naive Bayes classification model using estimated kernel densities.

Bayesian classifiers based on kernel density estimation - CIG

This paradigm is a Bayesian network which estimates the true density of the continuous vari- ables using kernels. Besides, tree-augmented naive Bayes, k- ......

Applying the Naïve Bayes classifier with kernel density ...

In this article, we present a new machine-learning method to predict residues binding to other proteins in protein sequences using the Naïve Bays...