Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Terrible Performance of jaccard_score

See original GitHub issue

Description

Recently I’ve changed from v0.19 to v0.21.3, and I soon found out that the performance of jaccard_score, which is the successor of jaccard_similarity_score is really bad.

Let’s take a simple binary job (which is very common in the field of Computer Vision, i.e., calculating the iou of two masks) as an example:

from sklearn.metrics import jaccard_score
from time import time
from tqdm import tqdm
import numpy as np


def cal_iou(mask1, mask2):
    mask1 = mask1.astype(np.bool)
    mask2 = mask2.astype(np.bool)
    intersection = np.sum(mask1 * mask2)
    union = np.sum((mask1 + mask2).astype(np.bool))
    return intersection / union


def cal_iou2(mask1, mask2):
    # Does not exam the data type. Be careful.
    intersection = np.sum(mask1 * mask2)
    union = np.sum((mask1 + mask2))
    return intersection / union


scale = 416

print('Calculating Random Masks...')
for _ in tqdm(range(100)):
    a = np.random.uniform(size=(scale, scale))
    b = np.random.uniform(size=(scale, scale))
    a = (a > 0.5)
    b = (b > 0.5)
    jaccard_score(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
    a = np.random.uniform(size=(scale, scale))
    b = np.random.uniform(size=(scale, scale))
    a = (a > 0.5)
    b = (b > 0.5)
    t1 = cal_iou(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
    a = np.random.uniform(size=(scale, scale))
    b = np.random.uniform(size=(scale, scale))
    a = (a > 0.5)
    b = (b > 0.5)
    t2 = cal_iou2(a.flatten(), b.flatten())

print('Calculating Fixed Masks...')
for _ in tqdm(range(100)):
    jaccard_score(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
    t1 = cal_iou(a.flatten(), b.flatten())
for _ in tqdm(range(100)):
    t2 = cal_iou2(a.flatten(), b.flatten())
print(t1 == t2)

Here is the output of my simple test program:

Calculating Random Masks...
100%|███████████| 100/100 [00:04<00:00, 20.62it/s]
100%|██████████| 100/100 [00:00<00:00, 222.92it/s] 
100%|██████████| 100/100 [00:00<00:00, 233.89it/s] 
Calculating Fixed Masks...
100%|███████████| 100/100 [00:04<00:00, 22.22it/s] 
100%|█████████| 100/100 [00:00<00:00, 1219.02it/s] 
100%|█████████| 100/100 [00:00<00:00, 2630.83it/s] 
True

On my notebook with i7-9750H, the sklean implementation is ~100X to ~1000X slower than my own implementation!

Versions

I’ve test on many platforms and I always get the similar result.

To be specific, here are the versions of packages on my notebook:

>>> sklearn.show_versions()

System:
    python: 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\14892\AppData\Local\Programs\Python\Python36\python.exe
   machine: Windows-10-10.0.19013-SP0

Python deps:
       pip: 19.2.3
setuptools: 41.0.1
   sklearn: 0.21.3
     numpy: 1.16.1
     scipy: 1.2.1
    Cython: 0.29.7
    pandas: 0.24.2

Issue Analytics

State:
Created 4 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

jnothmancommented, Nov 6, 2019

@EletronicElephant, are you using our implementation of jaccard_score to evaluate machine learning algorithms? If not, it might not be the right implementation for your use case. There are certainly better ways to evaluate jaccard similarity, and better ways to represent sets in order to evaluate jaccard similarity in other applications. scipy.spatial.distance.jaccard is also an optimised option.

For our application space, I would have thought it rare for a 100x cost (on a small baseline in absolute terms) in your scoring to be the efficiency bottleneck in your ML pipeline. But maybe I’m mistaken.

We certainly could investigate whether we can consistently improve efficiency of evaluation metrics for boolean dtype. But I’d be keen to do this for multilabel_confusion_matrix (and type_of_target used there and elsewhere), not in jaccard_score alone.

0reactions

glemaitrecommented, Jul 29, 2022

I agree with @jnothman on this one.