Wrong documentation for Jaccard
See original GitHub issuein /spatial/distance.py, the documentation for the function jaccard
specify that parameters u and v are both
(N,) array_like, bool Input array.
Howerver, in /spatial/tests/test_distance.py, there are function calls to jaccard where u and v are floating point valued arrays, am I missing something or should these tests not be there?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
sklearn.metrics.jaccard_score
jaccard_score may be a poor metric if there are no positives for some samples or classes. Jaccard is undefined if there are no...
Read more >Computing Jaccard Similarity between two documents
1 Answer 1 ... Remark: For this particular example, in each of these two sets every sequence of 2 words appears only once,...
Read more >Test Similarity Between Binary Data using Jaccard/Tanimoto ...
Compute statistical significance of Jaccard/Tanimoto similarity coefficients between binary vectors, using four different methods. Usage jaccard ...
Read more >Jaccard Index — PyTorch-Metrics 0.11.0 documentation
See the documentation of BinaryJaccardIndex , MulticlassJaccardIndex and MultilabelJaccardIndex for the specific details of each argument influence and ...
Read more >why is JaccardDistance always 0 for different docs from spark ...
Jaccard similarity as per the definition and spark implementation is between two sets. As the spark documentation: Jaccard distance of two ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The
_weight_checked()
wrapper will implicitly change some of the values to be neither 0 nor 1.In general, I think that many/most of these boolean-intended distances don’t deal with non-boolean values in a particularly consistent or useful manner. For instance, when we look at the example given for
jaccard
:we might think, okay, we’re converting the values to booleans first by their truthiness (so
2
becomesTrue
), but we’re not.The functions based on
_nbool_correspond_all()
also do weird things with non-bool values that I don’t think correspond with any well-known definition of these distances (but happy to be proved wrong, with accompanying tests for such defined behavior).I think it would be a good idea to define (and test) the behavior of all of these functions that they will convert the input arrays to booleans by the truthiness of the inputs.
Ah I hadn’t seen that discussion - I agree that the functions should convert inputs to bool and be explicit about this in the documentation. I’ll remove this PR as the documentation can all be updated in #14357 so it’s updated at the same time as the behavior change?