pairwise_distances(X) should always have 0 diagonal
See original GitHub issueIn the case of euclidean distances, we explicitly set the diagonal of unary pairwise distances to 0: https://github.com/scikit-learn/scikit-learn/blob/e170d479acff31aadd9d2ab404952b23f7994aab/sklearn/metrics/pairwise.py#L257 as well as in pairwise_distances_chunked: https://github.com/scikit-learn/scikit-learn/blob/e170d479acff31aadd9d2ab404952b23f7994aab/sklearn/metrics/pairwise.py#L1279. We also do so for cosine distances: https://github.com/scikit-learn/scikit-learn/blob/e170d479acff31aadd9d2ab404952b23f7994aab/sklearn/metrics/pairwise.py#L550.
We should zero the diagonal of the output for all metrics through pairwise_distances and pairwise_distances_chunked (where Y is None or Y is X
) to reduce the effect of imprecision during distance calculation.
That is, for all metric
we eventually want:
assert not np.any(pairwise_distances(X, metric=metric)[np.diag_indices(X.shape[0])])
and a similar assertion for pairwise_distances_chunked
I say eventually because I propose that:
- if current output satisfies
np.allclose(pairwise_distances(X, metric=metric)[np.diag_indices(X.shape[0])], 0)
, we just set it to 0 - if current output does not satisfy that condition, we only raise a FutureWarning saying that “The specified metric is not a valid dissimilarity metric; it does not return metric(x, x) == 0 for some values. In version 0.23, the diagonal of pairwise_distances(X) will be set to 0.” We do this because at the moment we do not strictly require that the metric be a valid dissimilarity measure.
- we do the same for
pairwise_distances_chunked
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
I don’t think so because you can use
euclidean_distances
instead ofpairwise_distances
and you still want 0s on the diagonal in that case.Btw,
np.fill_diagonal(array, 0)
does the job and is more readable 😃I’m proposing that we stop supporting them, but it would be worth tracing the history of that test to work out if it was well motivated