Unstable pairwise_distances
See original GitHub issueDescription
On some machines, the code below can output a nonzero value.
Steps/Code to Reproduce
import numpy as np
from sklearn.metrics.pairwise import pairwise_distances
np.random.seed(42)
# l = np.array(np.random.rand(9, 10000))
l = np.array(np.random.rand(9, 10000), dtype=np.float32)
m1 = pairwise_distances(l)
m2 = pairwise_distances(l[:4])
# print(m1[0, 1])
# print(m2[0, 1])
print(m2[0, 1] - m1[0, 1])
The bug also exists with the cosine distance but not the cityblock.
Expected Results
0.0
Actual Results
1.1444092e-05
Versions
Darwin-16.7.0-x86_64-i386-64bit
Python 3.6.4 |Anaconda, Inc.| (default, Mar 12 2018, 20:05:31)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.15.0
SciPy 1.1.0
Scikit-Learn 0.19.1
On this config, the bug doesn’t exist:
Linux-4.4.0-130-generic-x86_64-with-debian-stretch-sid
Python 3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0]
NumPy 1.15.0
SciPy 1.1.0
Scikit-Learn 0.19.1
The problem might come from another component.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:3
- Comments:10 (6 by maintainers)
Top Results From Across the Web
sklearn.metrics.pairwise_distances
Compute the distance matrix from a vector array X and optional Y. This method takes either a vector array or a distance matrix,...
Read more >Fast and Numerically Stable Pairwise Distance Algorithms
I'm looking for resources on fast, numerically stable pairwise euclidean distance algorithms. In particular, suppose A∈RM×D and B∈RN×D are two ...
Read more >Cluster analysis is unstable, we knew that!
Fundamentally the pairwise distances calculated are identical, but SQOM defaults ... This illustrates that cluster analysis can be unstable.
Read more >Instability in progressive multiple sequence alignment ...
... common multiple sequence alignment programs are inherently unstable, ... the total number of pairwise distances increases quadratically, ...
Read more >gmx-distance(1) — gromacs-data — Debian unstable — Debian ...
unstable / gromacs-data / gmx-distance(1) ... distances between two selections, including minimum, maximum, and pairwise distances, use gmx pairdist.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’m having the same problem
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)] on win32
Same NumPy, SciPy, sklearn, Windows 10 build 17134.
Thanks for the detailed report in any case @louisabraham !