Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Thread scalability is suboptimal

See original GitHub issue

As reported in https://github.com/ogrisel/pygbm/issues/30#issuecomment-436184680 , the scalability of pygbm is not as good as LightGBM.

Here are some results on a machine with the following CPUs:

Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz: 2 sockets each with 12 cores each which means 48 hyperthreads in total.

1 thread (sequential)

NUMBA_NUM_THREADS=1 OMP_NUM_THREADS=1 python benchmarks/bench_higgs_boson.py  --n-trees 100 --learning-rate 0.1 --n-leaf-nodes 255

Model	Time	AUC	Speed up
LightGBM	1045s	0.8282	1x
pygbm	1129s	0.8192	1x

8 threads

NUMBA_NUM_THREADS=8 OMP_NUM_THREADS=8 python benchmarks/bench_higgs_boson.py  --n-trees 100 --learning-rate 0.1 --n-leaf-nodes 255

Model	Time	AUC	Speed up
LightGBM	160s	0.8282	6.53x
pygbm	356s	0.8193	3.2x

48 (hyper)threads

python benchmarks/bench_higgs_boson.py  --n-trees 100 --learning-rate 0.1 --n-leaf-nodes 255

Model	Time	AUC	Speed up
LightGBM	91s	0.8282	11.5x
pygbm	130s	0.8193	8.7x

All of those pygbm runs used numba 0.40 from anaconda using the tbb backend (which is the fastest for pygbm).

Issue Analytics

State:
Created 5 years ago
Comments:13 (8 by maintainers)

Top GitHub Comments

1reaction

stuartarchibaldcommented, Dec 17, 2018

@Laurae2 I’m not sure what the “numba profiler” is, please could you clarify?! Numba has a built in parallel diagnostics tool which tracks transforms made to it’s own IR of the Python source as it converts serial code to parallel code, but that’s a compile-time diagnostic tool not a performance profiler.

Further, Numba 0.41.0 JIT profiling works with Intel Vtune, set the NUMBA_ENABLE_PROFILING environment variable to non-zero and that will register the LLVM JIT Event listener for Intel VTune.

0reactions

Laurae2commented, Jan 3, 2019

@ogrisel Note that LightGBM number of threads scale with the number of columns. Higgs dataset does not have enough columns for 48 threads (it will underestimate the scalability which gives you a lower scaling target).

Top Results From Across the Web

Manually writing a multithreaded loop - suboptimal scalability

I am wondering why it is not scaling as well as I would like it to. With 5 threads it runs 8s versus...

Scaling to Thousands of Threads - TheTechSolo

In such model, the scheduler has the full control of the interleaving strategy and for this reason, the achieved throughput may be sub-optimal....

What type of scaling with the number of threads should one ...

Ideally, you'd expect a linear scaling with threading. But in practice, I've found that increasing the number of threads reduces the ratio of...

GATK3.8 Thread Scalability. a Scalability of BaseRecalibrator

... of the thread count. Both BaseRecalibrator and HaplotypeCaller experience a 5-fold speedup compared to a single-threaded run when using 16 threads, but...

4. Eight Simple Rules for Designing Multithreaded Applications

The number of threads will increase exponentially (much like the wives, sacks, cats, and kittens coming from St. Ives). As the submatrixes get...