question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Odd (incorrect) behavior with normalized_mutual_info_score

See original GitHub issue

If the length of a mostly zero input with a single non-zero value is too long, it looks like metrics.normalized_mutual_info_score gives a non-sensical output. For the input below, oddly, reducing the length by 1, reverts back to the expected output.

I see closed issue #12940, but I don’t think it’s related.

Description

Erroronous output of 3.9921875 from metrics.normalized_mutual_info_score (range of 0 to 1) with certain inputs.

Steps/Code to Reproduce

Example:

In [1]: x = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0]
   ...:
   ...: y = [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
   ...: 0,0,0,0,0,0,0,0,0,0,0,0,0]
   ...:
   ...:
   ...: from sklearn.metrics import normalized_mutual_info_score
   ...: normalized_mutual_info_score(x, y)
   ...:
   ...:
/home/saladi/anaconda3/lib/python3.6/site-packages/sklearn/metrics/cluster/supervised.py:844: FutureWarning: The behavior of NMI will change in version 0.22. To match the behavior of 'v_measure_score', NMI will use average_method='arithmetic' by default.
  FutureWarning)
Out[1]: 3.9921875

In [3]: normalized_mutual_info_score(x[:-1], y[:-1])
/home/saladi/anaconda3/lib/python3.6/site-packages/sklearn/metrics/cluster/supervised.py:844: FutureWarning: The behavior of NMI will change in version 0.22. To match the behavior of 'v_measure_score', NMI will use average_method='arithmetic' by default.
  FutureWarning)
Out[3]: 0.0

Versions

In [2]: import sklearn; sklearn.show_versions()
   ...:


System:
    python: 3.6.8 |Anaconda custom (64-bit)| (default, Dec 30 2018, 01:22:34)  [GCC 7.3.0]
executable: /home/saladi/anaconda3/bin/python
   machine: Linux-4.4.0-109-generic-x86_64-with-debian-jessie-sid

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/saladi/anaconda3/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 10.0.1
setuptools: 39.1.0
   sklearn: 0.20.3
     numpy: 1.16.3
     scipy: 1.2.1
    Cython: 0.28.5
    pandas: 0.23.4

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
jeremiedbbcommented, Feb 25, 2022

Actually, normalized_mutual_info_score behavior changed in 0.22. to reproduce the issue, one should use normalized_mutual_info_score(x, y, average_method='geometric') With that I can still reproduce the issue.

1reaction
glemaitrecommented, May 9, 2019

This is due to some numerical error. With the geometric mean, to avoid a division by zero, we will set the normalizer to np.info('float64').eps. In this case, the mutual information is only 4 times bigger than eps.

Should we have a mechanism to detect this corner case and set the MNI to a specific score. In 0.21, we change the default to arithmetic mean which will be less prone to such normalization issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Normalized Mutual Information by Scikit Learn giving me ...
Your floating point data can't be used this way -- normalized_mutual_info_score is defined over clusters. The function is going to interpret ...
Read more >
sklearn.metrics.normalized_mutual_info_score
In this function, mutual information is normalized by some generalized mean of H(labels_true) and H(labels_pred)) , defined by the average_method . This measure ......
Read more >
(PDF) Correction for Closeness: Adjusting Normalized Mutual ...
Normalized mutual information (NMI) is a widely used measure to compare community detection methods. Recently, however, the need of ...
Read more >
Comparing Clusterings - An Overview
cians” problem [19], in which many persons were asked to rate the ... the clusterings, where the normalized mutual information between two ...
Read more >
Equitability, mutual information, and the maximal ... - PNAS
These findings are at odds with the recent work of Reshef et al. ... Mutual information is intimately connected to the statistical problem...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found