XGBoost Performance IssuesSee original GitHub issue
I ran some JMH benchmarks that show MLeap to be significantly slower than other libraries for evaluating XGBoost models.
Here you can see throughput (ops / sec) as a function of library and batch size, where:
xgboost4j = https://github.com/dmlc/xgboost/tree/master/jvm-packages xgboost-predictor-java = https://github.com/komiya-atsushi/xgboost-predictor-java yelp-xgboost = https://github.com/Yelp/xgboost-predictor-java mleap = https://github.com/combust/mleap
Given that Mleap makes use of
xgboost4j-spark does anyone know why it would have half the throughput of
xgboost4j? Also, is there a reason why
mleap does not observe constant throughput scaling like
- Created 4 years ago
- Comments:12 (9 by maintainers)
Top GitHub Comments
Hi Anca! So, we’ve run some tests above and noticed that xgboost4j is much slower than xgboost-predictor-java 😦 Historically we’ve used a fork of xgboost-predictor at Yelp (yelp-xgboost) so we’re hitting a performance issue when running MLeap cause xgboost4j seems thousands of times slower 😕
- I was thinking to make a PR to allow deserializing the model binary as either an xgboost-predictor OR xgboost4j (based users’ preference). Fortunately it seems that xgboost-predictor supports loading from xgboost4j binaries, see the ModelReader. I just wanted to check what you think before we get to it.
PS: I also noticed that @hollinwilkins looked into xgboost-predictor in the past, he commented on the xgboost-predictor project about deploying it to Maven Central (comment here). I wonder if they considered that rather than xgboost4j and why it didn’t work out ?