API for getting DBSCAN-like clusterings out of OPTICS with `fit_predict`
See original GitHub issueCurrently we have an interface for OPTICS with custom method extract_dbscan
. This is good for usability and visibility of the functionality, but means that a generic parameter search tool (like GridSearchCV
) can’t use OPTICS to perform DBSCAN at various eps
.
This would involve adding an eps
parameter which, when None, would use the default OPTICS clustering; when not None would use extract_dbscan
. But we would also need to retain the model across multiple fits…
Here are two alternative interfaces:
- Add a
warm_start
parameter (like many classifiers, regressors, but uncharted territory for clusterers). When True, andfit
orfit_predict
is called, the currentreachability_
,ordering_
andcore_distances_
would be kept, but a different final clustering step would be used to output / storelabels_
. - Add a
memory
parameter, like in hierarchical clustering. This would cache the mapping from parameters toreachability_
,ordering_
andcore_distances_
using ajoblib.Memory
.
I think the first option sounds more appropriate.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:76 (74 by maintainers)
Top Results From Across the Web
sklearn.cluster.OPTICS — scikit-learn 1.2.0 documentation
Estimate clustering structure from vector array. OPTICS (Ordering Points To Identify the Clustering Structure), closely related to DBSCAN, finds core sample of ...
Read more >NVIDIA DeepStream SDK API Reference: DBScan Based ...
Detailed Description. Defines the API for DBScan-based object clustering. ... Holds object clustering parameters required by DBSCAN.
Read more >How to predict on new data with saved OPTICS clustering model
I work with density based clustering and usually cluster on data (text) as and when I get it. However, I want to save...
Read more >Scikit Learn Docs PDF | PDF | Thread (Computing) | Python ... - Scribd
1.2.5 What's the best way to get help on scikit-learn usage? ... Scikit-learn's fit/predict API together with its efficient
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Regarding the verbs.
optics_distances
sounds likeeuclidean_distances
. I might ratheroptics_sort
orsort_optics
oroptics_order
or something. I’m also not sure thatextract
is better thancluster
.Yes, adding a memory parameter is one of the options here, and perhaps the simplest and most consistent with other clustering, i.e. hierarchical.