Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NMI and AMI use inconsistent definitions of mutual information

See original GitHub issue

There exist many defintions of NMI and AMI.

Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 2837-2854.

mention 5 different definitions of NMI, and based on that give 4 different AMI.

The NMI implemented in sklearn uses sqrt(H(U), H(V)) for normalization. The AMI implemented in sklearn uses max(H(U), H(V)) for normalization.

There exists an NMI with the max normalization, and a AMI with the sqrt normalization, so this is inconsistent in sklearn. Ideally, they would both use the same definition by default, and allow using any of the others via an option.

Issue Analytics

State:
Created 6 years ago
Comments:11 (11 by maintainers)

Top GitHub Comments

1reaction

aryamccarthycommented, May 24, 2018

Ooh, a twist. Sum is actually what V-measure uses—not sqrt. It seems we’ve covered the entire gamut. I’m going to take that as another argument in favor of sum. << Thought I hit ‘Comment’ on this some time ago.

1reaction

aryamccarthycommented, May 24, 2018

I’ve created a PR; waiting for tests to pass. ~~I think converging on sqrt is best for uniformity with V-measure.~~ EDIT: Nope, it’s not sqrt. It’s sum.

Top Results From Across the Web

Correction for Closeness: Adjusting Normalized Mutual ...

Abstract. Normalized mutual information (NMI) is a widely used measure to compare community detection methods.

On Normalized Mutual Information: Measure Derivations and ...

Normalized mutual information (NMI) measures are then obtained from those bounds, emphasizing the use of least upper bounds.

Normalized Mutual Information

NMI is a good measure for determining the quality of clustering. • It is an external measure because we need the class labels...

sklearn.metrics.normalized_mutual_info_score

Normalized Mutual Information (NMI) is a normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and...

Systematic Analysis of Cluster Similarity Indices

Adjusted Mutual Information addresses for the bias of NMI by subtracting the expected mutual information (Vinh et al.,. 2009). It is given by....