Terrible Performance of jaccard_score
See original GitHub issueDescription
Recently Iβve changed from v0.19 to v0.21.3, and I soon found out that the performance of jaccard_score
, which is the successor of jaccard_similarity_score
is really bad.
Letβs take a simple binary job (which is very common in the field of Computer Vision, i.e., calculating the iou of two masks) as an example:
from sklearn.metrics import jaccard_score
from time import time
from tqdm import tqdm
import numpy as np
def cal_iou(mask1, mask2):
mask1 = mask1.astype(np.bool)
mask2 = mask2.astype(np.bool)
intersection = np.sum(mask1 * mask2)
union = np.sum((mask1 + mask2).astype(np.bool))
return intersection / union
def cal_iou2(mask1, mask2):
# Does not exam the data type. Be careful.
intersection = np.sum(mask1 * mask2)
union = np.sum((mask1 + mask2))
return intersection / union
scale = 416
print('Calculating Random Masks...')
for _ in tqdm(range(100)):
a = np.random.uniform(size=(scale, scale))
b = np.random.uniform(size=(scale, scale))
a = (a > 0.5)
b = (b > 0.5)
jaccard_score(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
a = np.random.uniform(size=(scale, scale))
b = np.random.uniform(size=(scale, scale))
a = (a > 0.5)
b = (b > 0.5)
t1 = cal_iou(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
a = np.random.uniform(size=(scale, scale))
b = np.random.uniform(size=(scale, scale))
a = (a > 0.5)
b = (b > 0.5)
t2 = cal_iou2(a.flatten(), b.flatten())
print('Calculating Fixed Masks...')
for _ in tqdm(range(100)):
jaccard_score(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
t1 = cal_iou(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
t2 = cal_iou2(a.flatten(), b.flatten())
print(t1 == t2)
Here is the output of my simple test program:
Calculating Random Masks...
100%|βββββββββββ| 100/100 [00:04<00:00, 20.62it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 222.92it/s]
100%|ββββββββββ| 100/100 [00:00<00:00, 233.89it/s]
Calculating Fixed Masks...
100%|βββββββββββ| 100/100 [00:04<00:00, 22.22it/s]
100%|βββββββββ| 100/100 [00:00<00:00, 1219.02it/s]
100%|βββββββββ| 100/100 [00:00<00:00, 2630.83it/s]
True
On my notebook with i7-9750H, the sklean implementation is ~100X to ~1000X slower than my own implementation!
Versions
Iβve test on many platforms and I always get the similar result.
To be specific, here are the versions of packages on my notebook:
>>> sklearn.show_versions()
System:
python: 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\14892\AppData\Local\Programs\Python\Python36\python.exe
machine: Windows-10-10.0.19013-SP0
Python deps:
pip: 19.2.3
setuptools: 41.0.1
sklearn: 0.21.3
numpy: 1.16.1
scipy: 1.2.1
Cython: 0.29.7
pandas: 0.24.2
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
What can I do to improve sklearn's Jaccard similarity score ...
I am trying create a table of Jaccard similarity score on a list of vectors x ... If the slow part is the...
Read more >Accuracy vs Jaccard for multiclass problem - Cross Validated
I am working on classification problem and calculating accuracy and Jaccard score with scikit-learn which, I think, is a widely used libraryΒ ...
Read more >Jaccard Coefficient - an overview | ScienceDirect Topics
The low values of Jaccard coefficient for all the layers indicate that the turnover is generally greater than 75%, with a maximum of...
Read more >Communication-Efficient Jaccard similarity for High ... - arXiv
The performance behavior observed for this smaller dataset is a bit less consis- tent. We observe both superscalar speed-ups and some slow- downs...
Read more >Using networks to measure similarity between genes - NCBI
The Jaccard index is the proportion of shared nodes between A and B relative to ... The relatively poor performance of the Simpson...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@EletronicElephant, are you using our implementation of
jaccard_score
to evaluate machine learning algorithms? If not, it might not be the right implementation for your use case. There are certainly better ways to evaluate jaccard similarity, and better ways to represent sets in order to evaluate jaccard similarity in other applications. scipy.spatial.distance.jaccard is also an optimised option.For our application space, I would have thought it rare for a 100x cost (on a small baseline in absolute terms) in your scoring to be the efficiency bottleneck in your ML pipeline. But maybe Iβm mistaken.
We certainly could investigate whether we can consistently improve efficiency of evaluation metrics for boolean dtype. But Iβd be keen to do this for
multilabel_confusion_matrix
(andtype_of_target
used there and elsewhere), not in jaccard_score alone.I agree with @jnothman on this one.