Add clustering
See original GitHub issueimplementation in sktime
- create
BaseClusterer
- think about common unit tests for all clusterers
- potentially add soft dependency
- create
sktime.clustering
- template
from sktime.utils.validation.series_as_features import check_X
class Clusterer:
def fit(self, X, y):
X = check_X(X, enforce_univariate=True, convert_numpy=True)
X = X.squeeze(1) # this gives you 2d np.array as in sklearn
# ...
return self
def predict(self, X):
X = check_X(X, enforce_univariate=True, convert_numpy=True)
# ...
return # cluster labels
interfacing sklearn
- take a look at tslearn
bigger-picture design ideas
minimal viable product
- KNN with dtw
Related software: time series distances
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:5 (1 by maintainers)
Top Results From Across the Web
12 Attribute Clustering
Attribute clustering is a table-level directive that clusters data in close physical proximity based on the content of certain columns.
Read more >Clustering Keys & Clustered Tables - Snowflake Documentation
To improve the clustering of the underlying table micro-partitions, you can always manually sort rows on key table columns and re-insert them into...
Read more >Create a failover cluster | Microsoft Learn
This topic shows how to create a failover cluster by using either the Failover Cluster Manager snap-in or Windows PowerShell.
Read more >Oracle add clustering by linear order (attribute clustering)
Oracle notes that "Just like compression, attribute clustering is a directive that transparently kicks in for certain operations, namely direct path insertion ...
Read more >Create and use clustered tables | BigQuery - Google Cloud
Create, control access, and use clustered tables. ... Option 2: Click add_box Add field and enter the table schema. Specify each field's Name,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Going to start reviving this issue as the base clustering and the initial partitioning based algorithms have been implemented. At the time of writing all the clustering code is on the ‘clustering’ branch and will hopefully be merged in this weekend to main. In this clustering branch the base classes, K-means and K-medoids (soon to be followed by K-shapes just finishing the docstring for it) have been implemented. Ill quickly just explain the state of this branch and what is left to do for these, and explain whats on my immediate todo.
So the current clustering branch, as mentioned includes: K-means and K-medoids (and shortly will also have K-shapes). These algorithms effectivly work the same except for how they either calculate centers or the distance they use. As we need to be able to change the distance and how the next centers are calculated I have essentailly written from scratch the ‘k-means’ clustering algorithms and made it generic in the file sktime/clustering/partitioning/_time_series_k_partition.py. I could not use the sklearn implementation of k-means as sklearn does not let you change the distance nor the centering averaging method which is critical when using different distance measures. I have allowed in my implementation ( _time_series_k_parition.py) for complete control over using different distances (passed as metric in parameters), how the centers are first initialised (passes as init_algorithm iin parameters) and finally how the new centers are caculated each iteration by defining the abstract method calculate_new_centers which it is then for the specific algorithm how it is done.
Ill now define what is supported for paritioning clustering algorithms in terms of distances, center initilisation and updating centers:
Distances (though ive only tested euclidean and dtw):
Center initialisation:
New center calculation:
So overall a good start. So im going going to put below a summary of what I’m currently working on and my immediate todo list for this coming week and anyone else who wants to hop in and do some stuff is welcome:
Im currently working on:
TODO list (feel free to choose something you’d like to do)
Just going to follow this comment up with how im going about deciding what to implement and where im referencing most of my stuff. At the moment im basically just implementing the state of the art for partitioning as it is the most immediate algorithms I need for my Masters. What I am hoping to achieve over the rest of summer is to get to the same point in terms of algorithm coverage as the ‘leading’ time series clustering package which is probably an R library called dtwclust. I am also of course looking at the Sklearn clustering algorithm and hope to get all of these implemented.
More information on dtwclust can be found here: https://github.com/asardaes/dtwclust. Specifically im looking at https://cran.r-project.org/web/packages/dtwclust/vignettes/dtwclust.pdf for finer details on specific implementations however, im also hoping we can implement further algorithms such as DNNs clustering (cant cite the paper Im referring to for this as it is still under review) and shapelet based approaches (see https://ieeexplore.ieee.org/document/6413851) for example. Overall I hope we can create the most complete set of time series clustering algorithms across paritioning, density and hierarchical approches and am really looking forward to collaborating with others to make this happen!