Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorrect calculations of homogeneity, completeness and v-measure

See original GitHub issue

Description

Calculations of homogeneity, completeness and v-measure are now based on the original paper of Rosenberg & Hirschberg 2007. However, while I was doing research on fuzzy clustering evaluation techniques, I found the following paper of Utt et al. 2014 (http://www.lrec-conf.org/proceedings/lrec2014/pdf/829_Paper.pdf) which explained in a footnote that the original definitions of homogeneity and completeness contain typos. They claim it was confirmed by Rosenberg himself via personal communications.

Definitions used:

homogeneity = 1 - H(C|K) / H©
completeness = 1 - H(K|C) / H(K)

Corrected definitions:

homogeneity = 1 - H(C|K) / H(C,K)
completeness = 1 - H(K|C) / H(K,C)

Furthermore, since the calculations are now based on the mutual information score, this wouldn’t be correct anymore. Also, the statement in the documentation about it being the same as normalized mutual information with the metric set to ‘arithmetic’ would be false.

Steps/Code to Reproduce

from sklearn.metrics import homogeneity_completeness_v_measure

Expected Results

Actual Results

Versions

System: python: 3.6.7 (v3.6.7:6ec5cf24b7, Oct 20 2018, 13:35:33) [MSC v.1900 64 bit (AMD64)] executable: C:\Users\dtuser\AppData\Local\Programs\Python\Python36\python.exe machine: Windows-7-6.1.7601-SP1 BLAS: macros: lib_dirs: cblas_libs: cblas Python deps: pip: 18.1 setuptools: 40.6.3 sklearn: 0.20.1 numpy: 1.15.4 scipy: 1.1.0 Cython: None pandas: 0.23.4

Issue Analytics

State:
Created 5 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

royklipcommented, Jan 29, 2019

Well, I did some tests yesterday and it seems that the joint entropy does not differ that much with the ‘single’ entropy, because the conditional one is relatively small. This led only to a difference in score numbers after 2 decimal. This was however tested on a set were the score was already high (> 0.98).

I just reproduced the examples from the paper which resulted in much higher differences.

0reactions

royklipcommented, Jan 29, 2019

So far, it looks like they used the single entropy for their examples, at least the calculations give back the same scores. If I use the joint entropy the results differ by quite a bit.

I used the following code, which is fairly the same as the one in sklearn expect for the joint entropy addition:

def homogeneity_completeness_v_measure(labels_true, labels_pred):
    labels_true, labels_pred = check_clusterings(labels_true, labels_pred)

    if len(labels_true) == 0:
        return 1.0, 1.0, 1.0

    entropy_true = entropy(labels_true)
    entropy_pred = entropy(labels_pred)

    contingency = contingency_matrix(labels_true, labels_pred, sparse=True)
    mi = mutual_info_score(None, None, contingency)
    cond_entropy_tp = entropy_true - mi
    cond_entropy_pt = entropy_pred - mi

    # Same as: joint_entropy = entropy_true + cond_entropy_pt
    joint_entropy = entropy_pred + cond_entropy_tp

    homogeneity = 1 - cond_entropy_tp / joint_entropy if joint_entropy else 1.0
    completeness = 1 - cond_entropy_pt / joint_entropy if joint_entropy else 1.0

    if homogeneity + completeness == 0.0:
        v_measure_score = 0.0
    else:
        v_measure_score = (2.0 * homogeneity * completeness /
                           (homogeneity + completeness))

    return homogeneity, completeness, v_measure_score