question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Update default n_jobs used by XGBoost to not be -1

See original GitHub issue

https://github.com/alteryx/evalml/pull/2410 updates XGBoost by exposing the nthread parameter passed to XGBoost as n_jobs. However, while profiling I noticed that the default value of nthreads=-1 (use all threads) performs slower than using 2 threads. Upon further testing, it seems like after a certain number of threads, the performance drops significantly. In my case, the performance dropped after 16 threads (probably because I have 8-cores, 2 threads per core).

XGBoost docs mention that thread contention could significantly slow down performance of the algorithm: https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.core

This issue tracks investigating this and determining if we should change our default value of -1 to something generally more performant. It alarms me that in my example, having just two threads cut the runtime of fit in half.

image

I initially had this issue tracking CatBoost too, but after running a few more tests, I think the CatBoost differences are just due to variance, and not too concerning.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
bchen1116commented, Sep 7, 2021

@freddyaboulton @chukarsten @angela97lin here are the perf test results that I collected with XGBoost n_jobs. The ultimate conclusion that I came up with was to use n_jobs=12 as the default for XGBoost. Let me know what your thoughts are, and we can get to closing this issue out!

0reactions
bchen1116commented, Jul 15, 2021

Did some initial perf tests on looking glass, which I put here. I believe we need to address this looking glass issue before we can move forward with this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

XGBoost training with n_jobs = -1 not using all cores
When I set n_jobs to the number of threads I require, the usage of multiple cores happened. With n_jobs = 16 , my...
Read more >
Python API Reference — xgboost 1.7.2 documentation
Used when pred_contribs or pred_interactions is set to True. Changing the default of this parameter (False) is not recommended.
Read more >
Good model by default using XGBoost | BroadHorizon Cmotions
Extreme Gradient Boosting (XGBoost) explained and build with the use of ... Using XGBoost to predict which Data Scientists are likely to change...
Read more >
How to Control Your XGBoost Model | Capital One
XGBoost is a powerful gradient boosting tool for machine learning models, learn how pruning, regularization, and early stopping can help ...
Read more >
How to Best Tune Multithreading Support for XGBoost in Python
By default, this is set to 1, but can be set to -1 to use all of the CPU cores on your system,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found