AttributeError: 'AgglomerativeClustering' object has no attribute 'distances_'
See original GitHub issueDescription
While plotting a Hierarchical Clustering Dendrogram, I receive the following error:
AttributeError: ‘AgglomerativeClustering’ object has no attribute ‘distances_’
Steps/Code to Reproduce
plot_denogram
is a function from the example
similarity
is a cosine similarity matrix
import numpy as np
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram
documents = (
"The sky is blue",
"The sun is bright",
"The sun in the sky is bright",
"We can see the shining sun, the bright sun",
"The cat stretched.",
"Jacob stood on his tiptoes.",
"The car turned the corner.",
"Kelly twirled in circles.",
"She opened the door.",
"Aaron made a picture."
)
vec = TfidfVectorizer()
X = vec.fit_transform(documents) # `X` will now be a TF-IDF representation of the data, the first row of `X` corresponds to the first sentence in `data`
# Calculate the pairwise cosine similarities (depending on the amount of data that you are going to have this could take a while)
sims = cosine_similarity(X)
similarity = np.round(sims, decimals = 5)
cluster = AgglomerativeClustering(n_clusters = 10, affinity = "cosine", linkage = "average")
cluster.fit(similarity)
def plot_dendrogram(model, **kwargs):
# Create linkage matrix and then plot the dendrogram
# create the counts of samples under each node
counts = np.zeros(model.children_.shape[0])
n_samples = len(model.labels_)
for i, merge in enumerate(model.children_):
current_count = 0
for child_idx in merge:
if child_idx < n_samples:
current_count += 1 # leaf node
else:
current_count += counts[child_idx - n_samples]
counts[i] = current_count
linkage_matrix = np.column_stack([model.children_, model.distances_,
counts]).astype(float)
# Plot the corresponding dendrogram
dendrogram(linkage_matrix, **kwargs)
# plot the top three levels of the dendrogram
plot_dendrogram(cluster, truncate_mode='level', p=2)
plt.xlabel("Number of points in node (or index of point if no parenthesis).")
plt.show()
Expected Results
A denogram
Actual Results
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-20-6255925aaa42> in <module>
21
22 # plot the top three levels of the dendrogram
---> 23 plot_dendrogram(cluster, truncate_mode='level', p=3)
24 plt.xlabel("Number of points in node (or index of point if no parenthesis).")
25 plt.show()
<ipython-input-20-6255925aaa42> in plot_dendrogram(model, **kwargs)
14 counts[i] = current_count
15
---> 16 linkage_matrix = np.column_stack([model.children_, model.distances_,
17 counts]).astype(float)
18
AttributeError: 'AgglomerativeClustering' object has no attribute 'distances_'
Versions
System: python: 3.7.6 (default, Jan 8 2020, 13:42:34) [Clang 4.0.1 (tags/RELEASE_401/final)] executable: /Users/libbyh/anaconda3/envs/belfer/bin/python machine: Darwin-19.3.0-x86_64-i386-64bit
Python dependencies: pip: 20.0.2 setuptools: 46.0.0.post20200309 sklearn: 0.22.1 numpy: 1.16.4 scipy: 1.3.1 Cython: None pandas: 1.0.1 matplotlib: 3.1.1 joblib: 0.14.1
Built with OpenMP: True
Extra Info
If I use a distance matrix instead, the denogram appears.
distance = 1 - similarity
cluster_dist = AgglomerativeClustering(distance_threshold=0, n_clusters=None, affinity = "precomputed", linkage = "average")
cluster_dist.fit(distance)
plot_dendrogram(cluster_dist, truncate_mode='level', p=2)
plt.xlabel("Number of points in node (or index of point if no parenthesis).")
plt.show()
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:21 (4 by maintainers)
Top Results From Across the Web
python - AgglomerativeClustering, no attribute called distances_
official document of sklearn.cluster.AgglomerativeClustering() says. distances_ : array-like of shape (n_nodes-1,) Distances between nodes ...
Read more >sklearn.cluster.AgglomerativeClustering
Computes distances between clusters even if distance_threshold is not used. This can be used to make dendrogram visualization, but introduces a computational ...
Read more >Use a hierarchical clustering method to cluster the | Chegg.com
Hint: Use the scikit-learn function Agglomerative Clustering and set linkage to be ... AttributeError: 'AgglomerativeClustering' object has no attribute ...
Read more >sklearn.cluster.AgglomerativeClustering — scikit-learn 0.17 文档
Agglomerative Clustering. Recursively merges the pair of clusters that minimally increases a given linkage distance. Read more in the User Guide.
Read more >Why doesn't sklearn.cluster.AgglomerativeClustering give us ...
AgglomerativeClustering give us the distances between the merged clusters? ... Getting error AttributeError: 'str' object has no attribute 'strftime' ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m running into this problem as well. As @NicolasHug commented, the model only has .distances_ if distance_threshold is set. This does not solve the issue, however, because in order to specify n_clusters, one must set distance_threshold to None. I need to specify n_clusters. I must set distance_threshold to None. The example is still broken for this general use case.
Thanks all for the report. The
distances_
attribute only exists if thedistance_threshold
parameter is not None. This parameter was added in version 0.21.All the snippets in this thread that are failing are either using a version prior to 0.21, or don’t set
distance_threshold
.#17308 properly documents the
distances_
attribute.