API for getting DBSCAN-like clusterings out of OPTICS with `fit_predict`
See original GitHub issueCurrently we have an interface for OPTICS with custom method extract_dbscan. This is good for usability and visibility of the functionality, but means that a generic parameter search tool (like GridSearchCV) can’t use OPTICS to perform DBSCAN at various eps.
This would involve adding an eps parameter which, when None, would use the default OPTICS clustering; when not None would use extract_dbscan. But we would also need to retain the model across multiple fits…
Here are two alternative interfaces:
- Add a
warm_startparameter (like many classifiers, regressors, but uncharted territory for clusterers). When True, andfitorfit_predictis called, the currentreachability_,ordering_andcore_distances_would be kept, but a different final clustering step would be used to output / storelabels_. - Add a
memoryparameter, like in hierarchical clustering. This would cache the mapping from parameters toreachability_,ordering_andcore_distances_using ajoblib.Memory.
I think the first option sounds more appropriate.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:76 (74 by maintainers)
Top Results From Across the Web
sklearn.cluster.OPTICS — scikit-learn 1.2.0 documentation
Estimate clustering structure from vector array. OPTICS (Ordering Points To Identify the Clustering Structure), closely related to DBSCAN, finds core sample of ...
Read more >NVIDIA DeepStream SDK API Reference: DBScan Based ...
Detailed Description. Defines the API for DBScan-based object clustering. ... Holds object clustering parameters required by DBSCAN.
Read more >How to predict on new data with saved OPTICS clustering model
I work with density based clustering and usually cluster on data (text) as and when I get it. However, I want to save...
Read more >Scikit Learn Docs PDF | PDF | Thread (Computing) | Python ... - Scribd
1.2.5 What's the best way to get help on scikit-learn usage? ... Scikit-learn's fit/predict API together with its efficient
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Regarding the verbs.
optics_distancessounds likeeuclidean_distances. I might ratheroptics_sortorsort_opticsoroptics_orderor something. I’m also not sure thatextractis better thancluster.Yes, adding a memory parameter is one of the options here, and perhaps the simplest and most consistent with other clustering, i.e. hierarchical.