XGBoost Performance Issues
See original GitHub issueHello,
I ran some JMH benchmarks that show MLeap to be significantly slower than other libraries for evaluating XGBoost models.
Here you can see throughput (ops / sec) as a function of library and batch size, where:
xgboost4j = https://github.com/dmlc/xgboost/tree/master/jvm-packages xgboost-predictor-java = https://github.com/komiya-atsushi/xgboost-predictor-java yelp-xgboost = https://github.com/Yelp/xgboost-predictor-java mleap = https://github.com/combust/mleap
Given that Mleap makes use of xgboost4j-spark
does anyone know why it would have half the throughput of xgboost4j
? Also, is there a reason why mleap
does not observe constant throughput scaling like xgboost4j
does?
Thanks! -Ryan
Issue Analytics
- State:
- Created 4 years ago
- Reactions:5
- Comments:12 (9 by maintainers)
Top Results From Across the Web
Tune XGBoost Performance With Learning Curves
The two main reasons to use XGBoost are execution speed and model performance. XGBoost dominates structured or tabular datasets on ...
Read more >Pitfalls of Incorrectly Tuned XGBoost Hyperparameters
In this post, we investigated a scenario in which an XGBoost model trained with an incorrectly specified base score ends up producing ...
Read more >XGBoost: A Complete Guide to Fine-Tune and Optimize your ...
XGBoost (eXtreme Gradient Boosting) is not only an algorithm. It's an entire open-source library, designed as an optimized implementation of the ...
Read more >When to NOT use XGBoost? | Data Science and ... - Kaggle
Noisy Data : In case of noisy data, boosting models may overfit. · XGBoost, or Tree based algorithms in general, cannot extrapolate. ·...
Read more >XGBoost: Everything You Need to Know - Neptune.ai
XGBoost does not perform so well on sparse and unstructured data. · A common thing often forgotten is that Gradient Boosting is very...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
That sounds like a good plan @lucagiovagnoli.
Hi Anca! So, we’ve run some tests above and noticed that xgboost4j is much slower than xgboost-predictor-java 😦 Historically we’ve used a fork of xgboost-predictor at Yelp (yelp-xgboost) so we’re hitting a performance issue when running MLeap cause xgboost4j seems thousands of times slower 😕
PS: I also noticed that @hollinwilkins looked into xgboost-predictor in the past, he commented on the xgboost-predictor project about deploying it to Maven Central (comment here). I wonder if they considered that rather than xgboost4j and why it didn’t work out ?
@ancasarb