Odd (incorrect) behavior with normalized_mutual_info_score
See original GitHub issueIf the length of a mostly zero input with a single non-zero value is too long, it looks like metrics.normalized_mutual_info_score gives a non-sensical output. For the input below, oddly, reducing the length by 1, reverts back to the expected output.
I see closed issue #12940, but I don’t think it’s related.
Description
Erroronous output of 3.9921875 from metrics.normalized_mutual_info_score (range of 0 to 1) with certain inputs.
Steps/Code to Reproduce
Example:
In [1]: x = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0]
...:
...: y = [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
...: 0,0,0,0,0,0,0,0,0,0,0,0,0]
...:
...:
...: from sklearn.metrics import normalized_mutual_info_score
...: normalized_mutual_info_score(x, y)
...:
...:
/home/saladi/anaconda3/lib/python3.6/site-packages/sklearn/metrics/cluster/supervised.py:844: FutureWarning: The behavior of NMI will change in version 0.22. To match the behavior of 'v_measure_score', NMI will use average_method='arithmetic' by default.
FutureWarning)
Out[1]: 3.9921875
In [3]: normalized_mutual_info_score(x[:-1], y[:-1])
/home/saladi/anaconda3/lib/python3.6/site-packages/sklearn/metrics/cluster/supervised.py:844: FutureWarning: The behavior of NMI will change in version 0.22. To match the behavior of 'v_measure_score', NMI will use average_method='arithmetic' by default.
FutureWarning)
Out[3]: 0.0
Versions
In [2]: import sklearn; sklearn.show_versions()
...:
System:
python: 3.6.8 |Anaconda custom (64-bit)| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
executable: /home/saladi/anaconda3/bin/python
machine: Linux-4.4.0-109-generic-x86_64-with-debian-jessie-sid
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /home/saladi/anaconda3/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 10.0.1
setuptools: 39.1.0
sklearn: 0.20.3
numpy: 1.16.3
scipy: 1.2.1
Cython: 0.28.5
pandas: 0.23.4
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Normalized Mutual Information by Scikit Learn giving me ...
Your floating point data can't be used this way -- normalized_mutual_info_score is defined over clusters. The function is going to interpret ...
Read more >sklearn.metrics.normalized_mutual_info_score
In this function, mutual information is normalized by some generalized mean of H(labels_true) and H(labels_pred)) , defined by the average_method . This measure ......
Read more >(PDF) Correction for Closeness: Adjusting Normalized Mutual ...
Normalized mutual information (NMI) is a widely used measure to compare community detection methods. Recently, however, the need of ...
Read more >Comparing Clusterings - An Overview
cians” problem [19], in which many persons were asked to rate the ... the clusterings, where the normalized mutual information between two ...
Read more >Equitability, mutual information, and the maximal ... - PNAS
These findings are at odds with the recent work of Reshef et al. ... Mutual information is intimately connected to the statistical problem...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Actually, normalized_mutual_info_score behavior changed in 0.22. to reproduce the issue, one should use
normalized_mutual_info_score(x, y, average_method='geometric')With that I can still reproduce the issue.This is due to some numerical error. With the geometric mean, to avoid a division by zero, we will set the normalizer to
np.info('float64').eps. In this case, the mutual information is only 4 times bigger than eps.Should we have a mechanism to detect this corner case and set the MNI to a specific score. In 0.21, we change the default to arithmetic mean which will be less prone to such normalization issue.