question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong documentation for Jaccard

See original GitHub issue

in /spatial/distance.py, the documentation for the function jaccard specify that parameters u and v are both (N,) array_like, bool Input array. Howerver, in /spatial/tests/test_distance.py, there are function calls to jaccard where u and v are floating point valued arrays, am I missing something or should these tests not be there?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rkerncommented, Jul 23, 2021

The _weight_checked() wrapper will implicitly change some of the values to be neither 0 nor 1.

In general, I think that many/most of these boolean-intended distances don’t deal with non-boolean values in a particularly consistent or useful manner. For instance, when we look at the example given for jaccard:

>>> distance.jaccard([1, 0, 0], [1, 1, 0])
0.5
>>> distance.jaccard([1, 0, 0], [1, 2, 0])
0.5

we might think, okay, we’re converting the values to booleans first by their truthiness (so 2 becomes True), but we’re not.

>>> distance.jaccard([1, 1, 0], [1, 2, 0])
0.5

The functions based on _nbool_correspond_all() also do weird things with non-bool values that I don’t think correspond with any well-known definition of these distances (but happy to be proved wrong, with accompanying tests for such defined behavior).

I think it would be a good idea to define (and test) the behavior of all of these functions that they will convert the input arrays to booleans by the truthiness of the inputs.

0reactions
aricooperdaviscommented, Jul 23, 2021

Ah I hadn’t seen that discussion - I agree that the functions should convert inputs to bool and be explicit about this in the documentation. I’ll remove this PR as the documentation can all be updated in #14357 so it’s updated at the same time as the behavior change?

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.metrics.jaccard_score
jaccard_score may be a poor metric if there are no positives for some samples or classes. Jaccard is undefined if there are no...
Read more >
Computing Jaccard Similarity between two documents
1 Answer 1 ... Remark: For this particular example, in each of these two sets every sequence of 2 words appears only once,...
Read more >
Test Similarity Between Binary Data using Jaccard/Tanimoto ...
Compute statistical significance of Jaccard/Tanimoto similarity coefficients between binary vectors, using four different methods. Usage jaccard ...
Read more >
Jaccard Index — PyTorch-Metrics 0.11.0 documentation
See the documentation of BinaryJaccardIndex , MulticlassJaccardIndex and MultilabelJaccardIndex for the specific details of each argument influence and ...
Read more >
why is JaccardDistance always 0 for different docs from spark ...
Jaccard similarity as per the definition and spark implementation is between two sets. As the spark documentation: Jaccard distance of two ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found