[FEATURE] Bayesian Kernel Density Classifier
See original GitHub issue- I’ve been using this Bayesian kernel density classifier for a few years and I thought I should move it out from my poorly organized project to this one here.
The prior is $P(y=0)$. I primarily use it for spatial problems.
It is similar to the GMM Classifier with only 2 caveats I can think of.
- Hyperparameters are easier to decide on.
- Scaling is worse as I believe due to the KDE part scaling linearly with the sample size.
# noinspection PyPep8Naming
class BayesianKernelDensityClassifier(BaseEstimator, ClassifierMixin):
"""
Bayesian Classifier that uses Kernel Density Estimations to generate the joint distribution
Parameters:
- bandwidth: float
- kernel: for scikit learn KernelDensity
"""
def __init__(self, bandwidth=0.2, kernel='gaussian'):
self.classes_, self.models_, self.priors_logp_ = [None] * 3
self.bandwidth = bandwidth
self.kernel = kernel
def fit(self, X, y):
self.classes_ = np.sort(np.unique(y))
training_sets = [X[y == yi] for yi in self.classes_]
self.models_ = [KernelDensity(bandwidth=self.bandwidth, kernel=self.kernel).fit(x_subset)
for x_subset in training_sets]
self.priors_logp_ = [np.log(x_subset.shape[0] / X.shape[0]) for x_subset in training_sets]
return self
def predict_proba(self, X):
logp = np.array([model.score_samples(X) for model in self.models_]).T
result = np.exp(logp + self.priors_logp_)
return result / result.sum(1, keepdims=True)
def predict(self, X):
return self.classes_[np.argmax(self.predict_proba(X), 1)]
Issue Analytics
- State:
- Created 4 years ago
- Comments:26 (24 by maintainers)
Top Results From Across the Web
Naive Bayes Classifier using Kernel Density Estimation (with ...
A KDE weights a defined density around each observation xr equally first. In this regard, a kernel function K is needed – e.g....
Read more >Bayesian classifiers based on kernel density estimation
This paradigm is a Bayesian network which estimates the true density of the continuous variables using kernels. Besides, tree-augmented naive Bayes, k- ...
Read more >Naive Bayes (Kernel) - RapidMiner Documentation
Naive Bayes (Kernel) (RapidMiner Studio Core). Synopsis. This operator generates a Kernel Naive Bayes classification model using estimated kernel densities.
Read more >Bayesian classifiers based on kernel density estimation - CIG
This paradigm is a Bayesian network which estimates the true density of the continuous vari- ables using kernels. Besides, tree-augmented naive Bayes, k- ......
Read more >Applying the Naïve Bayes classifier with kernel density ...
In this article, we present a new machine-learning method to predict residues binding to other proteins in protein sequences using the Naïve Bays...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I don’t know precisely for sklearn either but I figured since the expectation step has to perform a dot product as the most expensive step to compute the expectation that would be p^3 (and apparently ~p^2.3). It should also scale linearly with the number of clusters.
(Not an academic level citation I know but https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra)
Whoops sorry.
https://colab.research.google.com/drive/12z28LCt2Y76smB01w2QizK3_cCo-6y2G
I never noticed how bad scaling is on larger than of the academic datasets and that actually has been a big thing. I’ve used it on 2D species distributional data (I’m trying to see if I am allowed to use that dataset for the notebook).