question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Documentation: Proper DataSource format and usage for K-Means Clustering

See original GitHub issue

Is your feature request related to a problem? Please describe. Still a newbie to this library, so thanks for bearing with me.

Right now, the documentation shows how to run K-Means clustering on an auto-generated data set of Gaussian clusters. This is great, as it shows K-Means is possible, but (unless I’m missing something) it does not show the steps to input real data. (It mentions You can also use any of the standard data loaders to pull in clustering data. but I don’t see where that’s documented).

I’ve figured out how to load a CSV file of features and metadata (thanks to your new Colunmar tutorial), but I can’t seem to infer how to connect this data with KMeansTrainer, or if that’s even the right approach.

Describe the solution you’d like A clear and concise description/example of how to load real-world (non-autogenerated) data into the K-Means algorithm.

Describe alternatives you’ve considered Looking through JavaDocs, but having trouble knowing what to focus on.

Additional context image

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:32 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
lincolnthreecommented, Oct 22, 2020

Hot dog!

Number of examples = 500
Number of features = 556
Label domain = []
Example = ArrayExample(numFeatures=21,output=-1,weight=1.0,metadata={name=Four-Color Omnath, id=876c6326-a40d-438b-89c0-825e647370d0},features=[(cards@1-N=0, 2.0)(cards@1-N=1, 1.0), (cards@1-N=10, 4.0), (cards@1-N=11, 4.0), (cards@1-N=12, 4.0), (cards@1-N=13, 5.0), (cards@1-N=14, 4.0), (cards@1-N=15, 3.0), (cards@1-N=16, 4.0), (cards@1-N=17, 2.0), (cards@1-N=18, 3.0), (cards@1-N=19, 2.0), (cards@1-N=2, 4.0), (cards@1-N=3, 3.0), (cards@1-N=4, 2.0), (cards@1-N=5, 1.0), (cards@1-N=6, 4.0), (cards@1-N=7, 2.0), (cards@1-N=8, 2.0), (cards@1-N=9, 4.0), (format@standard, 1.0), ])
0reactions
Craigacpcommented, Nov 30, 2020

We’ve also merged in an empty response processor implementation for use when loading clustering, anomaly detection or other datasets where you don’t expect there to be a ground truth output. I’m going to close this issue now as I think we’ve patched the usability issues you hit. Open a fresh one if you hit others, or re-open this if you think it’s not quite covered by PRs #99 and #98.

Read more comments on GitHub >

github_iconTop Results From Across the Web

K-means Clustering: Algorithm, Applications, Evaluation ...
Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-defined distinct non-overlapping subgroups ...
Read more >
K Means Clustering Algorithm in Python - Analytics Vidhya
K means clustering is an iterative algorithm. A Complete guide to Learn about k means clustering and how to implement k means clustering...
Read more >
K-Means Clustering in R: Algorithm and Practical Examples
The simplified format is kmeans(x, centers), where “x” is the data and centers is the number of clusters to be produced.
Read more >
K-means Cluster Analysis
K-means clustering is the simplest and the most commonly used clustering method for splitting a dataset into a set of k groups.
Read more >
K-Means Clustering Algorithm: Applications, Types, and How ...
Next, we use within-sum-of-squares as a measure to find the optimum number of clusters that can be formed for a given data set....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found