question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Thread scalability is suboptimal

See original GitHub issue

As reported in https://github.com/ogrisel/pygbm/issues/30#issuecomment-436184680 , the scalability of pygbm is not as good as LightGBM.

Here are some results on a machine with the following CPUs:

Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz: 2 sockets each with 12 cores each which means 48 hyperthreads in total.

1 thread (sequential)

NUMBA_NUM_THREADS=1 OMP_NUM_THREADS=1 python benchmarks/bench_higgs_boson.py  --n-trees 100 --learning-rate 0.1 --n-leaf-nodes 255
Model Time AUC Speed up
LightGBM 1045s 0.8282 1x
pygbm 1129s 0.8192 1x

8 threads

NUMBA_NUM_THREADS=8 OMP_NUM_THREADS=8 python benchmarks/bench_higgs_boson.py  --n-trees 100 --learning-rate 0.1 --n-leaf-nodes 255
Model Time AUC Speed up
LightGBM 160s 0.8282 6.53x
pygbm 356s 0.8193 3.2x

48 (hyper)threads

python benchmarks/bench_higgs_boson.py  --n-trees 100 --learning-rate 0.1 --n-leaf-nodes 255
Model Time AUC Speed up
LightGBM 91s 0.8282 11.5x
pygbm 130s 0.8193 8.7x

All of those pygbm runs used numba 0.40 from anaconda using the tbb backend (which is the fastest for pygbm).

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
stuartarchibaldcommented, Dec 17, 2018

@Laurae2 I’m not sure what the “numba profiler” is, please could you clarify?! Numba has a built in parallel diagnostics tool which tracks transforms made to it’s own IR of the Python source as it converts serial code to parallel code, but that’s a compile-time diagnostic tool not a performance profiler.

Further, Numba 0.41.0 JIT profiling works with Intel Vtune, set the NUMBA_ENABLE_PROFILING environment variable to non-zero and that will register the LLVM JIT Event listener for Intel VTune.

0reactions
Laurae2commented, Jan 3, 2019

@ogrisel Note that LightGBM number of threads scale with the number of columns. Higgs dataset does not have enough columns for 48 threads (it will underestimate the scalability which gives you a lower scaling target).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Manually writing a multithreaded loop - suboptimal scalability
I am wondering why it is not scaling as well as I would like it to. With 5 threads it runs 8s versus...
Read more >
Scaling to Thousands of Threads - TheTechSolo
In such model, the scheduler has the full control of the interleaving strategy and for this reason, the achieved throughput may be sub-optimal....
Read more >
What type of scaling with the number of threads should one ...
Ideally, you'd expect a linear scaling with threading. But in practice, I've found that increasing the number of threads reduces the ratio of...
Read more >
GATK3.8 Thread Scalability. a Scalability of BaseRecalibrator
... of the thread count. Both BaseRecalibrator and HaplotypeCaller experience a 5-fold speedup compared to a single-threaded run when using 16 threads, but...
Read more >
4. Eight Simple Rules for Designing Multithreaded Applications
The number of threads will increase exponentially (much like the wives, sacks, cats, and kittens coming from St. Ives). As the submatrixes get...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found