Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong documentation for Jaccard

See original GitHub issue

in /spatial/distance.py, the documentation for the function jaccard specify that parameters u and v are both (N,) array_like, bool Input array. Howerver, in /spatial/tests/test_distance.py, there are function calls to jaccard where u and v are floating point valued arrays, am I missing something or should these tests not be there?

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

rkerncommented, Jul 23, 2021

The _weight_checked() wrapper will implicitly change some of the values to be neither 0 nor 1.

In general, I think that many/most of these boolean-intended distances don’t deal with non-boolean values in a particularly consistent or useful manner. For instance, when we look at the example given for jaccard:

>>> distance.jaccard([1, 0, 0], [1, 1, 0])
0.5
>>> distance.jaccard([1, 0, 0], [1, 2, 0])
0.5

we might think, okay, we’re converting the values to booleans first by their truthiness (so 2 becomes True), but we’re not.

>>> distance.jaccard([1, 1, 0], [1, 2, 0])
0.5

The functions based on _nbool_correspond_all() also do weird things with non-bool values that I don’t think correspond with any well-known definition of these distances (but happy to be proved wrong, with accompanying tests for such defined behavior).

I think it would be a good idea to define (and test) the behavior of all of these functions that they will convert the input arrays to booleans by the truthiness of the inputs.

0reactions

aricooperdaviscommented, Jul 23, 2021

Ah I hadn’t seen that discussion - I agree that the functions should convert inputs to bool and be explicit about this in the documentation. I’ll remove this PR as the documentation can all be updated in #14357 so it’s updated at the same time as the behavior change?

Top Results From Across the Web

sklearn.metrics.jaccard_score

jaccard_score may be a poor metric if there are no positives for some samples or classes. Jaccard is undefined if there are no...

Computing Jaccard Similarity between two documents

1 Answer 1 ... Remark: For this particular example, in each of these two sets every sequence of 2 words appears only once,...

Test Similarity Between Binary Data using Jaccard/Tanimoto ...

Compute statistical significance of Jaccard/Tanimoto similarity coefficients between binary vectors, using four different methods. Usage jaccard ...

Jaccard Index — PyTorch-Metrics 0.11.0 documentation

See the documentation of BinaryJaccardIndex , MulticlassJaccardIndex and MultilabelJaccardIndex for the specific details of each argument influence and ...

why is JaccardDistance always 0 for different docs from spark ...

Jaccard similarity as per the definition and spark implementation is between two sets. As the spark documentation: Jaccard distance of two ...