question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AttributeError: 'AgglomerativeClustering' object has no attribute 'distances_'

See original GitHub issue

Description

While plotting a Hierarchical Clustering Dendrogram, I receive the following error:

AttributeError: ‘AgglomerativeClustering’ object has no attribute ‘distances_’

Steps/Code to Reproduce

plot_denogram is a function from the example similarity is a cosine similarity matrix

import numpy as np

import matplotlib.pyplot as plt

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import AgglomerativeClustering

from scipy.cluster.hierarchy import dendrogram

documents = (
    "The sky is blue",
    "The sun is bright",
    "The sun in the sky is bright",
    "We can see the shining sun, the bright sun",
    "The cat stretched.",
    "Jacob stood on his tiptoes.",
    "The car turned the corner.",
    "Kelly twirled in circles.",
    "She opened the door.",
    "Aaron made a picture."
    )

vec = TfidfVectorizer()
X = vec.fit_transform(documents) # `X` will now be a TF-IDF representation of the data, the first row of `X` corresponds to the first sentence in `data`

# Calculate the pairwise cosine similarities (depending on the amount of data that you are going to have this could take a while)
sims = cosine_similarity(X)

similarity = np.round(sims, decimals = 5)

cluster = AgglomerativeClustering(n_clusters = 10, affinity = "cosine", linkage = "average")  
cluster.fit(similarity)

def plot_dendrogram(model, **kwargs):
    # Create linkage matrix and then plot the dendrogram

    # create the counts of samples under each node
    counts = np.zeros(model.children_.shape[0])
    n_samples = len(model.labels_)
    for i, merge in enumerate(model.children_):
        current_count = 0
        for child_idx in merge:
            if child_idx < n_samples:
                current_count += 1  # leaf node
            else:
                current_count += counts[child_idx - n_samples]
        counts[i] = current_count

    linkage_matrix = np.column_stack([model.children_, model.distances_,
                                      counts]).astype(float)

    # Plot the corresponding dendrogram
    dendrogram(linkage_matrix, **kwargs)
    
# plot the top three levels of the dendrogram
plot_dendrogram(cluster, truncate_mode='level', p=2)
plt.xlabel("Number of points in node (or index of point if no parenthesis).")
plt.show()

Expected Results

A denogram

Actual Results

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-6255925aaa42> in <module>
     21 
     22 # plot the top three levels of the dendrogram
---> 23 plot_dendrogram(cluster, truncate_mode='level', p=3)
     24 plt.xlabel("Number of points in node (or index of point if no parenthesis).")
     25 plt.show()

<ipython-input-20-6255925aaa42> in plot_dendrogram(model, **kwargs)
     14         counts[i] = current_count
     15 
---> 16     linkage_matrix = np.column_stack([model.children_, model.distances_,
     17                                       counts]).astype(float)
     18 

AttributeError: 'AgglomerativeClustering' object has no attribute 'distances_'

Versions

System: python: 3.7.6 (default, Jan 8 2020, 13:42:34) [Clang 4.0.1 (tags/RELEASE_401/final)] executable: /Users/libbyh/anaconda3/envs/belfer/bin/python machine: Darwin-19.3.0-x86_64-i386-64bit

Python dependencies: pip: 20.0.2 setuptools: 46.0.0.post20200309 sklearn: 0.22.1 numpy: 1.16.4 scipy: 1.3.1 Cython: None pandas: 1.0.1 matplotlib: 3.1.1 joblib: 0.14.1

Built with OpenMP: True

Extra Info

If I use a distance matrix instead, the denogram appears.

distance = 1 - similarity

cluster_dist = AgglomerativeClustering(distance_threshold=0, n_clusters=None, affinity = "precomputed", linkage = "average")
cluster_dist.fit(distance)

plot_dendrogram(cluster_dist, truncate_mode='level', p=2)
plt.xlabel("Number of points in node (or index of point if no parenthesis).")
plt.show()

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:21 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
jules-stacycommented, Jul 24, 2021

I’m running into this problem as well. As @NicolasHug commented, the model only has .distances_ if distance_threshold is set. This does not solve the issue, however, because in order to specify n_clusters, one must set distance_threshold to None. I need to specify n_clusters. I must set distance_threshold to None. The example is still broken for this general use case.

2reactions
NicolasHugcommented, May 22, 2020

Thanks all for the report. The distances_ attribute only exists if the distance_threshold parameter is not None. This parameter was added in version 0.21.

All the snippets in this thread that are failing are either using a version prior to 0.21, or don’t set distance_threshold.

#17308 properly documents the distances_ attribute.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - AgglomerativeClustering, no attribute called distances_
official document of sklearn.cluster.AgglomerativeClustering() says. distances_ : array-like of shape (n_nodes-1,) Distances between nodes ...
Read more >
sklearn.cluster.AgglomerativeClustering
Computes distances between clusters even if distance_threshold is not used. This can be used to make dendrogram visualization, but introduces a computational ...
Read more >
Use a hierarchical clustering method to cluster the | Chegg.com
Hint: Use the scikit-learn function Agglomerative Clustering and set linkage to be ... AttributeError: 'AgglomerativeClustering' object has no attribute ...
Read more >
sklearn.cluster.AgglomerativeClustering — scikit-learn 0.17 文档
Agglomerative Clustering. Recursively merges the pair of clusters that minimally increases a given linkage distance. Read more in the User Guide.
Read more >
Why doesn't sklearn.cluster.AgglomerativeClustering give us ...
AgglomerativeClustering give us the distances between the merged clusters? ... Getting error AttributeError: 'str' object has no attribute 'strftime' ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found