question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

discrete branch: add a compelling example of discretization's benefits

See original GitHub issue

We recently merged a discretizing transformer into the discrete branch (see diff between that branch and master). Before merging it into master, we’d like a compelling example for our example gallery showing an application of machine learning where discretized features are particularly useful.

To dear contributor: Make sure to submit a pull request to the discrete branch.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:20 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
qinhanmin2014commented, Sep 6, 2017

@jnothman Regret me if the example is not good since I’m not an expert at machine learning 😃 The score is averaged over folds.

DecisionTree score before discretization : 0.946666666667
DecisionTree score std before discretization : 0.04
DecisionTree score after discretization : 0.96
DecisionTree score std after discretization : 0.0326598632371
SVC score before discretization : 0.96
SVC score std before discretization : 0.0249443825785
SVC score after discretization : 0.966666666667
SVC score std after discretization : 0.0249443825785

Since our discretization is naive, we cannot expect big improve. The experiment is designed mainly based on this paper (citation > 2000) and other materails. Here is part of the main code:

iris = load_iris()
X = iris.data
y = iris.target
X = X[:, [2,3]]
Xt = KBinsDiscretizer(n_bins=10, encode='ordinal').fit_transform(X)
clf1 = DecisionTreeClassifier(random_state=0)
print("DecisionTree score before discretization : {}"
      .format(np.mean(cross_val_score(clf1, X, y, cv=5))))
print("DecisionTree score std before discretization : {}"
      .format(np.std(cross_val_score(clf1, X, y, cv=5))))
clf2 = DecisionTreeClassifier(random_state=0)
print("DecisionTree score after discretization : {}"
      .format(np.mean(cross_val_score(clf2, Xt, y, cv=5))))
print("DecisionTree score std after discretization : {}"
      .format(np.std(cross_val_score(clf2, Xt, y, cv=5))))
1reaction
qinhanmin2014commented, Sep 6, 2017

@jnothman (Sorry for the repeatedly update) Here is my plan for the example, please have a look. Thanks 😃 Dataset: iris (only use two features) (1)plot the data before and after discretization index (2)train a classifier using the data before and after discretization and compare the result

DecisionTree score before discretization : 0.946666666667
DecisionTree score after discretization : 0.96
SVC score before discretization : 0.96
SVC score after discretization : 0.966666666667
Read more comments on GitHub >

github_iconTop Results From Across the Web

Discretization Method - an overview | ScienceDirect Topics
In future work, we will investigate non-equidistant discretizations. In regions of smaller gradient values, lower numbers of discrete points or finite elements ...
Read more >
Discretization: An Enabling Technique
Discretization is a process of quantizing continuous attributes. The success of discretization can significantly extend the borders of many learning algorithms.
Read more >
Supervised and Unsupervised Discretization of Continuous ...
Many supervised machine learning algo- rithms require a discrete feature space. In this paper, we review previous work on con-.
Read more >
An Introduction to Discretization Techniques for Data Scientists
Discretization is the process through which we can transform continuous variables, models or functions into a discrete form.
Read more >
(PDF) Discretization: An Enabling Technique - ResearchGate
Many studies show induction tasks can benefit from discretization: rules with discrete values are normally shorter and more understandable ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found