Documentation: Proper DataSource format and usage for K-Means Clustering
See original GitHub issueIs your feature request related to a problem? Please describe. Still a newbie to this library, so thanks for bearing with me.
Right now, the documentation shows how to run K-Means clustering on an auto-generated data set of Gaussian clusters. This is great, as it shows K-Means is possible, but (unless I’m missing something) it does not show the steps to input real data. (It mentions You can also use any of the standard data loaders to pull in clustering data.
but I don’t see where that’s documented).
I’ve figured out how to load a CSV file of features and metadata (thanks to your new Colunmar tutorial), but I can’t seem to infer how to connect this data with KMeansTrainer
, or if that’s even the right approach.
Describe the solution you’d like A clear and concise description/example of how to load real-world (non-autogenerated) data into the K-Means algorithm.
Describe alternatives you’ve considered Looking through JavaDocs, but having trouble knowing what to focus on.
Additional context
Issue Analytics
- State:
- Created 3 years ago
- Comments:32 (14 by maintainers)
Hot dog!
We’ve also merged in an empty response processor implementation for use when loading clustering, anomaly detection or other datasets where you don’t expect there to be a ground truth output. I’m going to close this issue now as I think we’ve patched the usability issues you hit. Open a fresh one if you hit others, or re-open this if you think it’s not quite covered by PRs #99 and #98.