Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Odd (incorrect) behavior with normalized_mutual_info_score

See original GitHub issue

If the length of a mostly zero input with a single non-zero value is too long, it looks like metrics.normalized_mutual_info_score gives a non-sensical output. For the input below, oddly, reducing the length by 1, reverts back to the expected output.

I see closed issue #12940, but I don’t think it’s related.

Description

Erroronous output of 3.9921875 from metrics.normalized_mutual_info_score (range of 0 to 1) with certain inputs.

Steps/Code to Reproduce

Example:

In [1]: x = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0]
   ...:
   ...: y = [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0]
   ...:
   ...:
   ...: from sklearn.metrics import normalized_mutual_info_score
   ...: normalized_mutual_info_score(x, y)
   ...:
   ...:
/home/saladi/anaconda3/lib/python3.6/site-packages/sklearn/metrics/cluster/supervised.py:844: FutureWarning: The behavior of NMI will change in version 0.22. To match the behavior of 'v_measure_score', NMI will use average_method='arithmetic' by default.
  FutureWarning)
Out[1]: 3.9921875

In [3]: normalized_mutual_info_score(x[:-1], y[:-1])
/home/saladi/anaconda3/lib/python3.6/site-packages/sklearn/metrics/cluster/supervised.py:844: FutureWarning: The behavior of NMI will change in version 0.22. To match the behavior of 'v_measure_score', NMI will use average_method='arithmetic' by default.
  FutureWarning)
Out[3]: 0.0

Versions

In [2]: import sklearn; sklearn.show_versions()
   ...:


System:
    python: 3.6.8 |Anaconda custom (64-bit)| (default, Dec 30 2018, 01:22:34)  [GCC 7.3.0]
executable: /home/saladi/anaconda3/bin/python
   machine: Linux-4.4.0-109-generic-x86_64-with-debian-jessie-sid

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/saladi/anaconda3/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 10.0.1
setuptools: 39.1.0
   sklearn: 0.20.3
     numpy: 1.16.3
     scipy: 1.2.1
    Cython: 0.28.5
    pandas: 0.23.4

Issue Analytics

State:
Created 4 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

jeremiedbbcommented, Feb 25, 2022

Actually, normalized_mutual_info_score behavior changed in 0.22. to reproduce the issue, one should use normalized_mutual_info_score(x, y, average_method='geometric') With that I can still reproduce the issue.

1reaction

glemaitrecommented, May 9, 2019

This is due to some numerical error. With the geometric mean, to avoid a division by zero, we will set the normalizer to np.info('float64').eps. In this case, the mutual information is only 4 times bigger than eps.

Should we have a mechanism to detect this corner case and set the MNI to a specific score. In 0.21, we change the default to arithmetic mean which will be less prone to such normalization issue.

Top Results From Across the Web

Normalized Mutual Information by Scikit Learn giving me ...

Your floating point data can't be used this way -- normalized_mutual_info_score is defined over clusters. The function is going to interpret ...

sklearn.metrics.normalized_mutual_info_score

In this function, mutual information is normalized by some generalized mean of H(labels_true) and H(labels_pred)) , defined by the average_method . This measure ......

(PDF) Correction for Closeness: Adjusting Normalized Mutual ...

Normalized mutual information (NMI) is a widely used measure to compare community detection methods. Recently, however, the need of ...

Comparing Clusterings - An Overview

cians” problem [19], in which many persons were asked to rate the ... the clusterings, where the normalized mutual information between two ...

Equitability, mutual information, and the maximal ... - PNAS

These findings are at odds with the recent work of Reshef et al. ... Mutual information is intimately connected to the statistical problem...