question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Terrible Performance of jaccard_score

See original GitHub issue

Description

Recently I’ve changed from v0.19 to v0.21.3, and I soon found out that the performance of jaccard_score, which is the successor of jaccard_similarity_score is really bad.

Let’s take a simple binary job (which is very common in the field of Computer Vision, i.e., calculating the iou of two masks) as an example:

from sklearn.metrics import jaccard_score
from time import time
from tqdm import tqdm
import numpy as np


def cal_iou(mask1, mask2):
    mask1 = mask1.astype(np.bool)
    mask2 = mask2.astype(np.bool)
    intersection = np.sum(mask1 * mask2)
    union = np.sum((mask1 + mask2).astype(np.bool))
    return intersection / union


def cal_iou2(mask1, mask2):
    # Does not exam the data type. Be careful.
    intersection = np.sum(mask1 * mask2)
    union = np.sum((mask1 + mask2))
    return intersection / union


scale = 416

print('Calculating Random Masks...')
for _ in tqdm(range(100)):
    a = np.random.uniform(size=(scale, scale))
    b = np.random.uniform(size=(scale, scale))
    a = (a > 0.5)
    b = (b > 0.5)
    jaccard_score(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
    a = np.random.uniform(size=(scale, scale))
    b = np.random.uniform(size=(scale, scale))
    a = (a > 0.5)
    b = (b > 0.5)
    t1 = cal_iou(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
    a = np.random.uniform(size=(scale, scale))
    b = np.random.uniform(size=(scale, scale))
    a = (a > 0.5)
    b = (b > 0.5)
    t2 = cal_iou2(a.flatten(), b.flatten())

print('Calculating Fixed Masks...')
for _ in tqdm(range(100)):
    jaccard_score(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
    t1 = cal_iou(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
    t2 = cal_iou2(a.flatten(), b.flatten())
print(t1 == t2)

Here is the output of my simple test program:

Calculating Random Masks...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100/100 [00:04<00:00, 20.62it/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100/100 [00:00<00:00, 222.92it/s] 
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100/100 [00:00<00:00, 233.89it/s] 
Calculating Fixed Masks...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100/100 [00:04<00:00, 22.22it/s] 
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100/100 [00:00<00:00, 1219.02it/s] 
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 100/100 [00:00<00:00, 2630.83it/s] 
True

On my notebook with i7-9750H, the sklean implementation is ~100X to ~1000X slower than my own implementation!

Versions

I’ve test on many platforms and I always get the similar result.

To be specific, here are the versions of packages on my notebook:

>>> sklearn.show_versions()

System:
    python: 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\14892\AppData\Local\Programs\Python\Python36\python.exe
   machine: Windows-10-10.0.19013-SP0

Python deps:
       pip: 19.2.3
setuptools: 41.0.1
   sklearn: 0.21.3
     numpy: 1.16.1
     scipy: 1.2.1
    Cython: 0.29.7
    pandas: 0.24.2

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, Nov 6, 2019

@EletronicElephant, are you using our implementation of jaccard_score to evaluate machine learning algorithms? If not, it might not be the right implementation for your use case. There are certainly better ways to evaluate jaccard similarity, and better ways to represent sets in order to evaluate jaccard similarity in other applications. scipy.spatial.distance.jaccard is also an optimised option.

For our application space, I would have thought it rare for a 100x cost (on a small baseline in absolute terms) in your scoring to be the efficiency bottleneck in your ML pipeline. But maybe I’m mistaken.

We certainly could investigate whether we can consistently improve efficiency of evaluation metrics for boolean dtype. But I’d be keen to do this for multilabel_confusion_matrix (and type_of_target used there and elsewhere), not in jaccard_score alone.

0reactions
glemaitrecommented, Jul 29, 2022

I agree with @jnothman on this one.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What can I do to improve sklearn's Jaccard similarity score ...
I am trying create a table of Jaccard similarity score on a list of vectors x ... If the slow part is the...
Read more >
Accuracy vs Jaccard for multiclass problem - Cross Validated
I am working on classification problem and calculating accuracy and Jaccard score with scikit-learn which, I think, is a widely used libraryΒ ...
Read more >
Jaccard Coefficient - an overview | ScienceDirect Topics
The low values of Jaccard coefficient for all the layers indicate that the turnover is generally greater than 75%, with a maximum of...
Read more >
Communication-Efficient Jaccard similarity for High ... - arXiv
The performance behavior observed for this smaller dataset is a bit less consis- tent. We observe both superscalar speed-ups and some slow- downs...
Read more >
Using networks to measure similarity between genes - NCBI
The Jaccard index is the proportion of shared nodes between A and B relative to ... The relatively poor performance of the Simpson...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found