question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FEATURE] Bayesian Kernel Density Classifier

See original GitHub issue
  • I’ve been using this Bayesian kernel density classifier for a few years and I thought I should move it out from my poorly organized project to this one here.

The prior is $P(y=0)$. I primarily use it for spatial problems.

It is similar to the GMM Classifier with only 2 caveats I can think of.

  • Hyperparameters are easier to decide on.
  • Scaling is worse as I believe due to the KDE part scaling linearly with the sample size.
# noinspection PyPep8Naming
class BayesianKernelDensityClassifier(BaseEstimator, ClassifierMixin):
    """
    Bayesian Classifier that uses Kernel Density Estimations to generate the joint distribution
    Parameters:
        - bandwidth: float
        - kernel: for scikit learn KernelDensity
    """
    def __init__(self, bandwidth=0.2, kernel='gaussian'):
        self.classes_, self.models_, self.priors_logp_ = [None] * 3
        self.bandwidth = bandwidth
        self.kernel = kernel

    def fit(self, X, y):
        self.classes_ = np.sort(np.unique(y))
        training_sets = [X[y == yi] for yi in self.classes_]
        self.models_ = [KernelDensity(bandwidth=self.bandwidth, kernel=self.kernel).fit(x_subset)
                        for x_subset in training_sets]

        self.priors_logp_ = [np.log(x_subset.shape[0] / X.shape[0]) for x_subset in training_sets]
        return self

    def predict_proba(self, X):
        logp = np.array([model.score_samples(X) for model in self.models_]).T
        result = np.exp(logp + self.priors_logp_)
        return result / result.sum(1, keepdims=True)

    def predict(self, X):
        return self.classes_[np.argmax(self.predict_proba(X), 1)]

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:26 (24 by maintainers)

github_iconTop GitHub Comments

1reaction
arose13commented, Feb 2, 2020

I don’t know precisely for sklearn either but I figured since the expectation step has to perform a dot product as the most expensive step to compute the expectation that would be p^3 (and apparently ~p^2.3). It should also scale linearly with the number of clusters.

(Not an academic level citation I know but https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra)

1reaction
arose13commented, Feb 2, 2020

Whoops sorry.

https://colab.research.google.com/drive/12z28LCt2Y76smB01w2QizK3_cCo-6y2G

I never noticed how bad scaling is on larger than of the academic datasets and that actually has been a big thing. I’ve used it on 2D species distributional data (I’m trying to see if I am allowed to use that dataset for the notebook).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Naive Bayes Classifier using Kernel Density Estimation (with ...
A KDE weights a defined density around each observation xr equally first. In this regard, a kernel function K is needed – e.g....
Read more >
Bayesian classifiers based on kernel density estimation
This paradigm is a Bayesian network which estimates the true density of the continuous variables using kernels. Besides, tree-augmented naive Bayes, k- ...
Read more >
Naive Bayes (Kernel) - RapidMiner Documentation
Naive Bayes (Kernel) (RapidMiner Studio Core). Synopsis. This operator generates a Kernel Naive Bayes classification model using estimated kernel densities.
Read more >
Bayesian classifiers based on kernel density estimation - CIG
This paradigm is a Bayesian network which estimates the true density of the continuous vari- ables using kernels. Besides, tree-augmented naive Bayes, k- ......
Read more >
Applying the Naïve Bayes classifier with kernel density ...
In this article, we present a new machine-learning method to predict residues binding to other proteins in protein sequences using the Naïve Bays...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found