unstable result with Decision Tree Classifier
See original GitHub issueDescription
I have developed a java code to produce a decision tree without any pruning strategy. The used desicion rule is also the default majority rule. Then I opted to use python for its simplicity. The problem is the randomness in the DecisionTreeClassifier. Although splitter is set to “best” and max_features=“None”, so as all features are used, and random_state to 1, I don’t finish with the same result as the java code generates. Exactly the same training and test data sets are used for python and java. How can I eliminate all randomness to obtain the same result with the java code please? Help please.
Steps/Code to Reproduce
Expected Results
Same decision tree produced with java
Actual Results
different confusion matrix in each time I fixed random_state to 1.
Versions
Windows-8.1-6.3.9600-SP0 Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] NumPy 1.14.0 SciPy 1.0.0 Scikit-Learn 0.20.dev0
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
The randomness comes from ties on the features to split on. What you describe is expected, as you will obtain different trees for different
random_state
parameters. Or for different implementations, even with the samerandom_state
parameter. See #12259 for a discussion to remove this randomness.@amueller @jmschrei @nelson-liu