Upgrading to 1.0.0-rc2 results in a large drop in classification performance using LightGBMClassifier.
See original GitHub issueDescribe the bug
Updating mmlspark from 1.0.0-rc1-51-df0244c7-SNAPSHOT
to 1.0.0-rc2
, while keeping all other aspects of my code the same, results in a large drop in validation Average Precision when using LightGBMClassifier: from 0.574 to 0.313
params = {
'num_trees': 1000,
'early_stopping_rounds': 0,
'feature_fraction': 0.7,
'l1_reg': 0.0,
'l2_reg': 0.0,
'max_depth': -1,
'num_leaves': 31,
'is_unbalance': True
}
lgb = LightGBMClassifier(
featuresCol='features',
labelCol='Label',
slotNames=features,
categoricalSlotNames=idx_cat_cols,
timeout=12000.0,
useBarrierExecutionMode=True,
numIterations=params['num_trees'],
isUnbalance=params['is_unbalance'],
earlyStoppingRound=params['early_stopping_rounds'],
featureFraction=params['feature_fraction'],
lambdaL1=params['l1_reg'],
lambdaL2=params['l2_reg'],
maxDepth=params['max_depth'],
numLeaves=params['num_leaves']
)
To Reproduce I am seeing this result on a private dataset with 140,000,000 rows and 130 feature columns. I am a Microsoft employee so we can talk offline if more details are needed.
Expected behavior Comparable validation performance between versions.
Info (please complete the following information):
- MMLSpark Version:
1.0.0-rc2
- Spark Version:
2.4.5
- Spark Platform:
Databricks (runtime 6.6 ML)
If the bug pertains to a specific feature please tag the appropriate CODEOWNER for better visibility @imatiach-msft
Additional context Did any underlying default settings change?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:18 (8 by maintainers)
Top Results From Across the Web
Upgrading to 1.0.0-rc2 results in a large drop in classification ...
I am seeing this result on a private dataset with 140,000,000 rows and 130 feature columns. I am a Microsoft employee so we...
Read more >LightGBM Classifier in Python | Kaggle
LightGBM is a fast, distributed, high performance gradient boosting framework based on decision tree algorithms, used for ranking, classification and many other ...
Read more >Understanding LightGBM Parameters (and How to Tune Them)
How to tune lightGBM parameters in python? Gradient boosting methods. With LightGBM, you can run different types of Gradient boosting methods.
Read more >lightgbm.LGBMClassifier — LightGBM 3.3.3.99 documentation
LightGBM classifier. ... Construct a gradient boosting model. ... Number of parallel threads to use for training (can be changed at prediction time ......
Read more >Use LightGBM Classifier and Regressor in Python - ProjectPro
ProjectPro can help you learn how to use use LightGBM Classifier and ... for the model using classification_report and confusion matrix by ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@imatiach-msft is seems to me the package was broken somewhere after this commit (
82e7a8eb
).Here is the log of a simple regression task using rc3 (same issue with rc2 but rc1 and the commit I referenced above are ok), I used spark 2.4.7 (scala 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_265) on GCP (debian image)
notice how the l2 loss explodes after one iteration:
The current release v1.0.0-rc3 still has this issue. The v1.0.0-rc1 version is the latest one without it.