silhouette_samples gives incorrect result from precomputed distance matrix with diagonal entries
See original GitHub issueDescription
silhouette_samples gives incorrect result from precomputed distance matrix with diagonal entries.
When using silhouette_samples and metric=‘precomputed’, if the input distance matrix has non-zero values along the diagonal then the silhouette scores are incorrect.
Suggested Solution Before calculating the scores the diagonal entries of a precomputed distance matrix should be set to zero.
Steps/Code to Reproduce
Example:
import numpy as np
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.metrics import silhouette_samples
dists = pairwise_distances(np.array([[0.2, 0.1, 0.12, 1.34, 1.11, 1.6]]).transpose())
diag_dists = np.diag(np.ones(6)) + dists
labels = [0,0,0,1,1,1]
print(silhouette_samples(diag_dists, labels, metric = 'precomputed'))
Expected Results
[0.92173913, 0.952, 0.95934959, 0.79583333, 0.62886598, 0.74315068]
Actual Results
[0.48695652, 0.552, 0.55284553, 0.37916667, 0.11340206, 0.40068493]
Versions
Darwin-17.7.0-x86_64-i386-64bit Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 12:04:33) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] NumPy 1.15.1 SciPy 1.1.0 Scikit-Learn 0.20.0
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (9 by maintainers)
Top Results From Across the Web
Unable to calculate silhouette_score using a sparse matrix in ...
diagonal (X)) > atol): raise ValueError( 'The precomputed distance matrix contains non-zero ' 'elements on the diagonal. Use np.fill_diagonal(X, ...
Read more >sklearn.metrics.pairwise_distances
This method provides a safe way to take a distance matrix as input, while preserving compatibility with many other algorithms that take a...
Read more >scipy.cluster.hierarchy.linkage — SciPy v1.9.3 Manual
When only one cluster remains in the forest, the algorithm stops, and this cluster becomes the root. A distance matrix is maintained at...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
(Also, hi Stephen!)
If this is still in need of fixing, I’d like to take this on.