Supervised dimesion reduction overfitting
See original GitHub issueApologies if this is a dumb question, but what’s the best way to understand level of overfitting when using supervised approach?
In example below, the separation of points is very clear despite random input. I understand the reason for it, but don’t know how to asses the extent to which separation is driven by overfitting vs by “real” differences in data generating process. Any suggestions? Masking some samples works somewhat, but only if there are enough left for fitting after mask which there aren’t in my data.
import umap
import numpy as np
from matplotlib import pyplot as plt
testSamples=400
randomrows=np.random.randint(0,2,size=(testSamples,50))
testMetadata=np.random.randint(0,2,size=testSamples)
fitter = umap.UMAP(n_neighbors=25, min_dist=0.1, metric='hamming').fit(randomrows, y=testMetadata)
#uncomment these to run semi-supervised
#testMetadata_masked=testMetadata
#testMetadata_masked[np.random.choice(len(testMetadata_masked), size=50, replace=False)] = -1
#fitter = umap.UMAP(n_neighbors=25, min_dist=0.1, metric='hamming').fit(randomrows, y=testMetadata_masked)
embedding = fitter.embedding_
plt.scatter(embedding[:,0],embedding[:,1], s=5, c=testMetadata)
plt.show()
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Supervised dimensionality reduction for big data - Nature
We have introduced a very simple methodology to improve performance on supervised learning problems with wide data (that is, big data where ...
Read more >Avoid overfitting with feature selection and dimensionality ...
Another common approach of reducing dimensionality reduction approach is to transform high-dimensional data in lower-dimensional space. This transformation ...
Read more >2) Reduce overfitting: Feature reduction and Dropouts
This is Part 2 of our article on how to reduce overfitting. ... to Reduce the number of features is also termed Dimensionality...
Read more >How to Mitigate Overfitting with Dimensionality Reduction
Dimensionality reduction (DR) is another useful technique that can be used to mitigate overfitting in machine learning models.
Read more >The Effect of Different Dimensionality Reduction Techniques ...
comparative study of nine dimensionality reduction methods. ... rough set; overfitting; underfitting; machine learning ... supervised learning.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
That is one way to think about it but I think it’s a bit misleading. I think that a better way to think about it is that as you drop the n_neighbour you weaken all of your signal and start fitting the fiddly bits in your manifold. This definitely weakens your supervised signal and prevents it from dominating your space but it also weakens the signal from your original space. Remember you are experimenting in the presence of no structure here.
I think the best way to think about fully supervised UMAP is that you’ve got two embeddings and you are folding them together. One has perfect clustering and the other is random noise. I’m going to represent these embeddings via nearest neighbour graphs (which I’ll fold together). As I reduce n_neighbours I’m inducing less edges in both graphs. This has less of those very consistent supervised edges to as compared to the random edges. As such it becomes more likely that a few of the random edges agree with our supervised edges and you get that mixing structure.
I’d recommend exploring this trade off by examining the effect these things have in the presence of structured data and labels. I probably should turn this into a read the docs page but finding time for such things can be challenging. In the meantime here is a slight modification of your code that might help provide some intuition.
And here are the same images with n_neighbours turned down to 5. You’ll see that while we are indeed weakening the supervised structure we are also weakening and tearing apart our unsupervised structure.
Hopefully this is helpful.
Another way you could do this, with less extreme data, would be to use the
target_weight
parameter and drive it down to 0 to try and de-emphasize your supervised distance. Unfortunately, there is so little structure contained within your distance that the clouds still separate into two distinct blobs even when setting the target_weight to the minimum value of 0. That said it’s an easy way to go when you’ve got less extreme data.target_weight
: float (optional, default 0.5) weighting factor between data topology and target topology. A value of 0.0 weights predominantly on data, a value of 1.0 places a strong emphasis on target. The default of 0.5 balances the weighting equally between data and target.