question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

unstable result with Decision Tree Classifier

See original GitHub issue

Description

I have developed a java code to produce a decision tree without any pruning strategy. The used desicion rule is also the default majority rule. Then I opted to use python for its simplicity. The problem is the randomness in the DecisionTreeClassifier. Although splitter is set to “best” and max_features=“None”, so as all features are used, and random_state to 1, I don’t finish with the same result as the java code generates. Exactly the same training and test data sets are used for python and java. How can I eliminate all randomness to obtain the same result with the java code please? Help please.

Steps/Code to Reproduce

Expected Results

Same decision tree produced with java

Actual Results

different confusion matrix in each time I fixed random_state to 1.

Versions

Windows-8.1-6.3.9600-SP0 Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] NumPy 1.14.0 SciPy 1.0.0 Scikit-Learn 0.20.dev0

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ngoixcommented, Oct 9, 2018

The randomness comes from ties on the features to split on. What you describe is expected, as you will obtain different trees for different random_state parameters. Or for different implementations, even with the same random_state parameter. See #12259 for a discussion to remove this randomness.

0reactions
Ichaabcommented, Oct 8, 2018
Read more comments on GitHub >

github_iconTop Results From Across the Web

Instability of decision tree classification algorithms
The instability problem of decision tree classification algorithms is that small changes in input training samples may cause dramatically ...
Read more >
The Indecisive Decision Tree — Story of an emotional ...
Decision tree is unstable because training a tree with a slightly different sub-sample causes the structure of the tree to change drastically.
Read more >
Improving Stability of Decision Trees Abstract
Decision -tree algorithms are known to be unstable: small variations in the training set can result in different trees and different predictions for...
Read more >
How are decision trees potentially unstable? - Stack Overflow
Decision trees can potentially be unstable if there is a small variation in the data that may result in a completely different tree...
Read more >
Why decision trees are called unstable models? - Quora
They are unstable in the sense that minor modifications to the input can lead to major changes in the model. This is not...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found