Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

unstable result with Decision Tree Classifier

See original GitHub issue

Description

I have developed a java code to produce a decision tree without any pruning strategy. The used desicion rule is also the default majority rule. Then I opted to use python for its simplicity. The problem is the randomness in the DecisionTreeClassifier. Although splitter is set to “best” and max_features=“None”, so as all features are used, and random_state to 1, I don’t finish with the same result as the java code generates. Exactly the same training and test data sets are used for python and java. How can I eliminate all randomness to obtain the same result with the java code please? Help please.

Steps/Code to Reproduce

Expected Results

Same decision tree produced with java

Actual Results

different confusion matrix in each time I fixed random_state to 1.

Versions

Windows-8.1-6.3.9600-SP0 Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] NumPy 1.14.0 SciPy 1.0.0 Scikit-Learn 0.20.dev0

Issue Analytics

State:
Created 5 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

ngoixcommented, Oct 9, 2018

The randomness comes from ties on the features to split on. What you describe is expected, as you will obtain different trees for different random_state parameters. Or for different implementations, even with the same random_state parameter. See #12259 for a discussion to remove this randomness.

0reactions

Ichaabcommented, Oct 8, 2018

@amueller @jmschrei @nelson-liu

Top Results From Across the Web

Instability of decision tree classification algorithms

The instability problem of decision tree classification algorithms is that small changes in input training samples may cause dramatically ...

The Indecisive Decision Tree — Story of an emotional ...

Decision tree is unstable because training a tree with a slightly different sub-sample causes the structure of the tree to change drastically.

Improving Stability of Decision Trees Abstract

Decision -tree algorithms are known to be unstable: small variations in the training set can result in different trees and different predictions for...

How are decision trees potentially unstable? - Stack Overflow

Decision trees can potentially be unstable if there is a small variation in the data that may result in a completely different tree...

Why decision trees are called unstable models? - Quora

They are unstable in the sense that minor modifications to the input can lead to major changes in the model. This is not...