Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

XGBoost Performance Issues

See original GitHub issue

Hello,

I ran some JMH benchmarks that show MLeap to be significantly slower than other libraries for evaluating XGBoost models.

Here you can see throughput (ops / sec) as a function of library and batch size, where:

xgboost4j = https://github.com/dmlc/xgboost/tree/master/jvm-packages xgboost-predictor-java = https://github.com/komiya-atsushi/xgboost-predictor-java yelp-xgboost = https://github.com/Yelp/xgboost-predictor-java mleap = https://github.com/combust/mleap

Given that Mleap makes use of xgboost4j-spark does anyone know why it would have half the throughput of xgboost4j? Also, is there a reason why mleap does not observe constant throughput scaling like xgboost4j does?

Thanks! -Ryan

Issue Analytics

State:
Created 4 years ago
Reactions:5
Comments:12 (9 by maintainers)

Top GitHub Comments

1reaction

ancasarbcommented, Feb 4, 2020

That sounds like a good plan @lucagiovagnoli.

1reaction

lucagiovagnolicommented, Feb 1, 2020

Hi Anca! So, we’ve run some tests above and noticed that xgboost4j is much slower than xgboost-predictor-java 😦 Historically we’ve used a fork of xgboost-predictor at Yelp (yelp-xgboost) so we’re hitting a performance issue when running MLeap cause xgboost4j seems thousands of times slower 😕

I was thinking to make a PR to allow deserializing the model binary as either an xgboost-predictor OR xgboost4j (based users’ preference). Fortunately it seems that xgboost-predictor supports loading from xgboost4j binaries, see the ModelReader. I just wanted to check what you think before we get to it.

PS: I also noticed that @hollinwilkins looked into xgboost-predictor in the past, he commented on the xgboost-predictor project about deploying it to Maven Central (comment here). I wonder if they considered that rather than xgboost4j and why it didn’t work out ?

@ancasarb