question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AttributeError thrown when calling metrics.pairwise_distances with binary metrics and Y is None

See original GitHub issue

Description

AttributeError thrown when calling metrics.pairwise_distances with binary metrics if Y is None.

Steps/Code to Reproduce

import numpy as np
import sklearn
binary_data = np.array((0, 0, 0, 0, 0, 1, 
                        1, 0, 0, 1, 1, 0),
                       dtype = "bool").reshape((2, 6))
sklearn.metrics.pairwise_distances(binary_data, metric="jaccard")

Expected Results

No error. Should return a numpy.ndarray of shape (2, 2) containing the pairwise distances.

Actual Results

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-21-fa618e0f7808> in <module>
----> 1 sklearn.metrics.pairwise_distances(binary_data, metric="jaccard")

e:\dev\python\anaconda\envs\umap\lib\site-packages\sklearn\metrics\pairwise.py in pairwise_distances(X, Y, metric, n_jobs, **kwds)
   1562         dtype = bool if metric in PAIRWISE_BOOLEAN_FUNCTIONS else None
   1563 
-> 1564         if dtype == bool and (X.dtype != bool or Y.dtype != bool):
   1565             msg = "Data was converted to boolean for metric %s" % metric
   1566             warnings.warn(msg, DataConversionWarning)

AttributeError: 'NoneType' object has no attribute 'dtype'

Versions

machine: Windows-10-10.0.17134-SP0
python: 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)]
sklearn: 0.21.0
numpy: 1.16.3
scipy: 1.2.1

This worked correctly in sklearn version 0.20.3. I think the problem was introduced in https://github.com/scikit-learn/scikit-learn/commit/4b9e12e73b52382937029d29759976c3ef4aee3c#diff-dd76b3805500714227411a6460b149a8: there is now a code path where Y has its dtype checked without any prior check as to whether Y is None.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
annikagablecommented, Apr 28, 2020

I’m sorry, it seems to be a slightly different issue because it only occurs when the array is wrapped in a pandas dataframe. Pandas version is 1.0.3

import numpy as np
import sklearn.metrics
import pandas as pd
binary_data = np.array((0, 0, 0, 0, 0, 1, 
                        1, 0, 0, 1, 1, 0),
                       dtype = "bool").reshape((2, 6))
binary_data = pd.DataFrame(binary_data)
sklearn.metrics.pairwise_distances(binary_data, metric="jaccard")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-7203f3b805a8> in <module>
      6                        dtype = "bool").reshape((2, 6))
      7 binary_data = pd.DataFrame(binary_data)
----> 8 sklearn.metrics.pairwise_distances(binary_data, metric="jaccard")

/mnt/mnemo4/annika/anaconda3/envs/py38_sklearn/lib/python3.8/site-packages/sklearn/metrics/pairwise.py in pairwise_distances(X, Y, metric, n_jobs, **kwds)
   1569 
   1570         if (dtype == bool and
-> 1571                 (X.dtype != bool or (Y is not None and Y.dtype != bool))):
   1572             msg = "Data was converted to boolean for metric %s" % metric
   1573             warnings.warn(msg, DataConversionWarning)

/mnt/mnemo4/annika/anaconda3/envs/py38_sklearn/lib/python3.8/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5273                 return self[name]
-> 5274             return object.__getattribute__(self, name)
   5275 
   5276     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'dtype'
0reactions
jnothmancommented, Apr 28, 2020

Can you please open a new issue for this since it is a distinct issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.metrics.pairwise_distances
Compute the distance matrix from a vector array X and optional Y. This method takes either a vector array or a distance matrix,...
Read more >
sklearn clustering with custom metric: pairwise_distances ...
Try this: import numpy as np import pandas as pd from scipy.spatial.distance import directed_hausdorff from sklearn.metrics.pairwise import ...
Read more >
All Models - pyod 1.0.7 documentation
The threshold is calculated for generating binary outlier labels. labels_int, either 0 or 1 ... is an outlier or not. y is ignored...
Read more >
Biopython Tutorial and Cookbook
Why doesn't Bio.SeqIO.index_db() work? The module imports fine but there is no index_db function! You need Biopython 1.57 or later (and a ...
Read more >
pyod Documentation - Read the Docs
Turn raw outlier scores into binary labels by assign 1 to top n outlier scores ... [str, optional (default='roc_auc_score')] Evaluation metric:.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found