question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pairwise_distances(X) should always have 0 diagonal

See original GitHub issue

In the case of euclidean distances, we explicitly set the diagonal of unary pairwise distances to 0: https://github.com/scikit-learn/scikit-learn/blob/e170d479acff31aadd9d2ab404952b23f7994aab/sklearn/metrics/pairwise.py#L257 as well as in pairwise_distances_chunked: https://github.com/scikit-learn/scikit-learn/blob/e170d479acff31aadd9d2ab404952b23f7994aab/sklearn/metrics/pairwise.py#L1279. We also do so for cosine distances: https://github.com/scikit-learn/scikit-learn/blob/e170d479acff31aadd9d2ab404952b23f7994aab/sklearn/metrics/pairwise.py#L550.

We should zero the diagonal of the output for all metrics through pairwise_distances and pairwise_distances_chunked (where Y is None or Y is X) to reduce the effect of imprecision during distance calculation.

That is, for all metric we eventually want:

assert not np.any(pairwise_distances(X, metric=metric)[np.diag_indices(X.shape[0])])

and a similar assertion for pairwise_distances_chunked

I say eventually because I propose that:

  • if current output satisfies np.allclose(pairwise_distances(X, metric=metric)[np.diag_indices(X.shape[0])], 0), we just set it to 0
  • if current output does not satisfy that condition, we only raise a FutureWarning saying that “The specified metric is not a valid dissimilarity metric; it does not return metric(x, x) == 0 for some values. In version 0.23, the diagonal of pairwise_distances(X) will be set to 0.” We do this because at the moment we do not strictly require that the metric be a valid dissimilarity measure.
  • we do the same for pairwise_distances_chunked

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jeremiedbbcommented, Nov 21, 2018

I don’t think so because you can use euclidean_distances instead of pairwise_distances and you still want 0s on the diagonal in that case.

Btw, np.fill_diagonal(array, 0) does the job and is more readable 😃

0reactions
jnothmancommented, Nov 21, 2018

I’m proposing that we stop supporting them, but it would be worth tracing the history of that test to work out if it was well motivated

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.metrics.pairwise_distances
This method provides a safe way to take a distance matrix as input, while preserving compatibility with many other algorithms that take a...
Read more >
Parallel construction of a distance matrix - python
The default value k=0 will return the diagonal of zeros as well as the real distance values and should be set to k=1...
Read more >
Pairwise Distance - an overview | ScienceDirect Topics
For a given point X[i], the tSNE method considers distance to another point X[j] as proportional to the probability that ith point would...
Read more >
How to vectorize pairwise (dis)similarity metrics | by Ben Cook
Say we have two 4-dimensional NumPy vectors, x and x_prime . ... will be the same, which means we should get 0s along...
Read more >
Find points whose pairwise distances approximate a given ...
PCA will reconstruct a full-dimensional space, so if you want just ... Note that B has a non-zero diagonal (you worried about this...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found