Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AffinityPropagation creates 3d array of cluster centers on rare occasions

See original GitHub issue

Description

Just stumbled upon a rare combination of training data and preference value that causes the model to save its cluster centers as a 3d ndarray instead of expected 2d.

Steps/Code to Reproduce

import numpy as np
from sklearn.cluster.affinity_propagation_ import AffinityPropagation

train_data = np.array([[-1.,  1.], [1., -1.]])
model = AffinityPropagation(preference=-10).fit(train_data)
model.cluster_centers_

yields

array([[[-1.,  1.], [ 1., -1.]]])  # 3d!!

and

model.predict(train_data)

leads to

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/cluster/affinity_propagation_.py", line 324, in predict
    return pairwise_distances_argmin(X, self.cluster_centers_)
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 464, in pairwise_distances_argmin
    metric_kwargs)[0]
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 339, in pairwise_distances_argmin_min
    X, Y = check_pairwise_arrays(X, Y)
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 111, in check_pairwise_arrays
    warn_on_dtype=warn_on_dtype, estimator=estimator)
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/utils/validation.py", line 405, in check_array
    % (array.ndim, estimator_name))
ValueError: Found array with dim 3. check_pairwise_arrays expected <= 2.

When using slightly different values for preference (e.g. 0 or -20), or slightly different training data (e.g. [[-1, 1], [1, -0.9]]), cluster centers are stored correctly as 2d ndarray.

Expected Results

Cluster centers to be stored as 2d ndarray, as in normal cases.

Versions

Darwin-15.6.0-x86_64-i386-64bit (‘Python’, ‘2.7.13 (default, Jul 18 2017, 09:16:53) \n[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]’) (‘NumPy’, ‘1.13.1’) (‘SciPy’, ‘0.19.1’) (‘Scikit-Learn’, ‘0.18.2’)

Issue Analytics

State:
Created 6 years ago
Comments:16 (13 by maintainers)

Top GitHub Comments

1reaction

jsamoochacommented, Aug 27, 2017

I saw the floating point issues only when dealing with the edge case above, i.e. [[-1, 1], [1, -1]] as training samples. Depending on preference and damping params, the A and R diagonals would “converge” to e.g. [0.45, -0.45] and [-0.45, 0.45]. But then the code

# Check for convergence
E = (np.diag(A) + np.diag(R)) > 0
e[:, it % convergence_iter] = E
K = np.sum(E, axis=0)

would somehow lead to different values of E (and K) per (small sequence of) iteration(s). This would then lead to the incidental non-convergence for particular values of preferences, as I mentioned before (i.e. convergence to K=2 when preference=0, convergence to K=1 when preference<-20, but intermittent convergence to K=1 or non-convergence for preference in <-20, -9].

The solution in the PR immediately returns cluster centers for the edge case above without running the actual algorithm, and as such avoids the rounding issues.

0reactions

MartinHjelmcommented, May 31, 2018

I am using OSX so I have Hombrew’s Python3 and installed scikit-learn and numpy via pip. I do not use Homebrew’s numpy install.

Top Results From Across the Web

sklearn.cluster.AffinityPropagation

Fit clustering from features/affinity matrix; return cluster labels. Parameters: X{array-like, sparse matrix} of shape (n_samples, n_features), or array- ...

Clustering Algorithms: From Start to State of the Art - Toptal

The algorithm begins by selecting k points as starting centroids ('centers' of clusters). We can just select any k random points, or we...

Subspace clustering using affinity propagation - UConn Math

This method starts with the similarity measures between pairs of data points and keeps passing real-valued messages between data points until a high-...

Affinity Propagation preferences initialization - Stack Overflow

I thought affinity propagation could be my choice, since I could control the number of clusters by setting the preference parameter. However, if ......

APCluster - An R Package for Affinity Propagation Clustering

Affinity propagation (AP) is a relatively new clustering algorithm that has been ... The function apcluster() creates an object belonging to the S4...