question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

XGBoost Performance Issues

See original GitHub issue

Hello,

I ran some JMH benchmarks that show MLeap to be significantly slower than other libraries for evaluating XGBoost models.

image

Here you can see throughput (ops / sec) as a function of library and batch size, where:

xgboost4j = https://github.com/dmlc/xgboost/tree/master/jvm-packages xgboost-predictor-java = https://github.com/komiya-atsushi/xgboost-predictor-java yelp-xgboost = https://github.com/Yelp/xgboost-predictor-java mleap = https://github.com/combust/mleap

Given that Mleap makes use of xgboost4j-spark does anyone know why it would have half the throughput of xgboost4j? Also, is there a reason why mleap does not observe constant throughput scaling like xgboost4j does?

Thanks! -Ryan

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:5
  • Comments:12 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
ancasarbcommented, Feb 4, 2020

That sounds like a good plan @lucagiovagnoli.

1reaction
lucagiovagnolicommented, Feb 1, 2020

Hi Anca! So, we’ve run some tests above and noticed that xgboost4j is much slower than xgboost-predictor-java 😦 Historically we’ve used a fork of xgboost-predictor at Yelp (yelp-xgboost) so we’re hitting a performance issue when running MLeap cause xgboost4j seems thousands of times slower 😕

  • I was thinking to make a PR to allow deserializing the model binary as either an xgboost-predictor OR xgboost4j (based users’ preference). Fortunately it seems that xgboost-predictor supports loading from xgboost4j binaries, see the ModelReader. I just wanted to check what you think before we get to it.

PS: I also noticed that @hollinwilkins looked into xgboost-predictor in the past, he commented on the xgboost-predictor project about deploying it to Maven Central (comment here). I wonder if they considered that rather than xgboost4j and why it didn’t work out ?

@ancasarb

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tune XGBoost Performance With Learning Curves
The two main reasons to use XGBoost are execution speed and model performance. XGBoost dominates structured or tabular datasets on ...
Read more >
Pitfalls of Incorrectly Tuned XGBoost Hyperparameters
In this post, we investigated a scenario in which an XGBoost model trained with an incorrectly specified base score ends up producing ...
Read more >
XGBoost: A Complete Guide to Fine-Tune and Optimize your ...
XGBoost (eXtreme Gradient Boosting) is not only an algorithm. It's an entire open-source library, designed as an optimized implementation of the ...
Read more >
When to NOT use XGBoost? | Data Science and ... - Kaggle
Noisy Data : In case of noisy data, boosting models may overfit. · XGBoost, or Tree based algorithms in general, cannot extrapolate. ·...
Read more >
XGBoost: Everything You Need to Know - Neptune.ai
XGBoost does not perform so well on sparse and unstructured data. · A common thing often forgotten is that Gradient Boosting is very...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found