AttributeError thrown when calling metrics.pairwise_distances with binary metrics and Y is None
See original GitHub issueDescription
AttributeError thrown when calling metrics.pairwise_distances with binary metrics if Y is None.
Steps/Code to Reproduce
import numpy as np
import sklearn
binary_data = np.array((0, 0, 0, 0, 0, 1,
1, 0, 0, 1, 1, 0),
dtype = "bool").reshape((2, 6))
sklearn.metrics.pairwise_distances(binary_data, metric="jaccard")
Expected Results
No error. Should return a numpy.ndarray of shape (2, 2) containing the pairwise distances.
Actual Results
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-21-fa618e0f7808> in <module>
----> 1 sklearn.metrics.pairwise_distances(binary_data, metric="jaccard")
e:\dev\python\anaconda\envs\umap\lib\site-packages\sklearn\metrics\pairwise.py in pairwise_distances(X, Y, metric, n_jobs, **kwds)
1562 dtype = bool if metric in PAIRWISE_BOOLEAN_FUNCTIONS else None
1563
-> 1564 if dtype == bool and (X.dtype != bool or Y.dtype != bool):
1565 msg = "Data was converted to boolean for metric %s" % metric
1566 warnings.warn(msg, DataConversionWarning)
AttributeError: 'NoneType' object has no attribute 'dtype'
Versions
machine: Windows-10-10.0.17134-SP0
python: 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)]
sklearn: 0.21.0
numpy: 1.16.3
scipy: 1.2.1
This worked correctly in sklearn version 0.20.3. I think the problem was introduced in https://github.com/scikit-learn/scikit-learn/commit/4b9e12e73b52382937029d29759976c3ef4aee3c#diff-dd76b3805500714227411a6460b149a8: there is now a code path where Y has its dtype checked without any prior check as to whether Y is None.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:7 (3 by maintainers)
Top Results From Across the Web
sklearn.metrics.pairwise_distances
Compute the distance matrix from a vector array X and optional Y. This method takes either a vector array or a distance matrix,...
Read more >sklearn clustering with custom metric: pairwise_distances ...
Try this: import numpy as np import pandas as pd from scipy.spatial.distance import directed_hausdorff from sklearn.metrics.pairwise import ...
Read more >All Models - pyod 1.0.7 documentation
The threshold is calculated for generating binary outlier labels. labels_int, either 0 or 1 ... is an outlier or not. y is ignored...
Read more >Biopython Tutorial and Cookbook
Why doesn't Bio.SeqIO.index_db() work? The module imports fine but there is no index_db function! You need Biopython 1.57 or later (and a ...
Read more >pyod Documentation - Read the Docs
Turn raw outlier scores into binary labels by assign 1 to top n outlier scores ... [str, optional (default='roc_auc_score')] Evaluation metric:.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I’m sorry, it seems to be a slightly different issue because it only occurs when the array is wrapped in a pandas dataframe. Pandas version is 1.0.3
Can you please open a new issue for this since it is a distinct issue?