question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AffinityPropagation creates 3d array of cluster centers on rare occasions

See original GitHub issue

Description

Just stumbled upon a rare combination of training data and preference value that causes the model to save its cluster centers as a 3d ndarray instead of expected 2d.

Steps/Code to Reproduce

import numpy as np
from sklearn.cluster.affinity_propagation_ import AffinityPropagation

train_data = np.array([[-1.,  1.], [1., -1.]])
model = AffinityPropagation(preference=-10).fit(train_data)
model.cluster_centers_

yields

array([[[-1.,  1.], [ 1., -1.]]])  # 3d!!

and

model.predict(train_data)

leads to

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/cluster/affinity_propagation_.py", line 324, in predict
    return pairwise_distances_argmin(X, self.cluster_centers_)
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 464, in pairwise_distances_argmin
    metric_kwargs)[0]
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 339, in pairwise_distances_argmin_min
    X, Y = check_pairwise_arrays(X, Y)
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 111, in check_pairwise_arrays
    warn_on_dtype=warn_on_dtype, estimator=estimator)
  File "/Users/jsamoocha/.virtualenvs/coach/lib/python2.7/site-packages/sklearn/utils/validation.py", line 405, in check_array
    % (array.ndim, estimator_name))
ValueError: Found array with dim 3. check_pairwise_arrays expected <= 2.

When using slightly different values for preference (e.g. 0 or -20), or slightly different training data (e.g. [[-1, 1], [1, -0.9]]), cluster centers are stored correctly as 2d ndarray.

Expected Results

Cluster centers to be stored as 2d ndarray, as in normal cases.

Versions

Darwin-15.6.0-x86_64-i386-64bit (‘Python’, ‘2.7.13 (default, Jul 18 2017, 09:16:53) \n[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]’) (‘NumPy’, ‘1.13.1’) (‘SciPy’, ‘0.19.1’) (‘Scikit-Learn’, ‘0.18.2’)

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:16 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
jsamoochacommented, Aug 27, 2017

I saw the floating point issues only when dealing with the edge case above, i.e. [[-1, 1], [1, -1]] as training samples. Depending on preference and damping params, the A and R diagonals would “converge” to e.g. [0.45, -0.45] and [-0.45, 0.45]. But then the code

# Check for convergence
E = (np.diag(A) + np.diag(R)) > 0
e[:, it % convergence_iter] = E
K = np.sum(E, axis=0)

would somehow lead to different values of E (and K) per (small sequence of) iteration(s). This would then lead to the incidental non-convergence for particular values of preferences, as I mentioned before (i.e. convergence to K=2 when preference=0, convergence to K=1 when preference<-20, but intermittent convergence to K=1 or non-convergence for preference in <-20, -9].

The solution in the PR immediately returns cluster centers for the edge case above without running the actual algorithm, and as such avoids the rounding issues.

0reactions
MartinHjelmcommented, May 31, 2018

I am using OSX so I have Hombrew’s Python3 and installed scikit-learn and numpy via pip. I do not use Homebrew’s numpy install.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.cluster.AffinityPropagation
Fit clustering from features/affinity matrix; return cluster labels. Parameters: X{array-like, sparse matrix} of shape (n_samples, n_features), or array- ...
Read more >
Clustering Algorithms: From Start to State of the Art - Toptal
The algorithm begins by selecting k points as starting centroids ('centers' of clusters). We can just select any k random points, or we...
Read more >
Subspace clustering using affinity propagation - UConn Math
This method starts with the similarity measures between pairs of data points and keeps passing real-valued messages between data points until a high-...
Read more >
Affinity Propagation preferences initialization - Stack Overflow
I thought affinity propagation could be my choice, since I could control the number of clusters by setting the preference parameter. However, if ......
Read more >
APCluster - An R Package for Affinity Propagation Clustering
Affinity propagation (AP) is a relatively new clustering algorithm that has been ... The function apcluster() creates an object belonging to the S4...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found