Slowdown when using Ray with tqdm
See original GitHub issueWhat is the problem?
We come from #5554
Ray version and other system information (Python version, TensorFlow version, OS): ray==0.8.4 line_profiler==3.0.2 tqdm==4.38.0
Issue When running the suggestion from #5554 with a function with arguments, the computation time with progress bar is much longer than without it. Also, as a curiosity the tqdm values seem to reset some time after the computation is finished.
Reproduction (REQUIRED)
Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):
The code can be found as a notebook in https://colab.research.google.com/drive/1mQsYVHochZPITwL5fSIv2IxVY1c0cdQ6
I run a mock function test
in 3 ways: parallelized with and without tqdm
, and unparallelized with tqdm
def test(L):
np.random.randint(0, 100, L)
test_remote = ray.remote(test)
def parallel_with_bar(N, L):
ray.init(ignore_reinit_error=True, num_cpus=os.cpu_count())
def to_iterator(obj_ids):
while obj_ids:
done, obj_ids = ray.wait(obj_ids)
yield ray.get(done[0])
obj_ids = [test_remote.remote(L) for i in range(N)]
ret = []
for x in tqdm(to_iterator(obj_ids), total=len(obj_ids)):
ret.append(x)
ray.shutdown()
def parallel_without_bar(N, L):
ray.init(ignore_reinit_error=True, num_cpus=os.cpu_count())
obj_ids = [test_remote.remote(L) for i in range(N)]
ret = ray.get(obj_ids)
ray.shutdown()
def single_with_bar(N, L):
ret = [test(L) for i in tqdm(range(N))]
To profile each option I used line_profiler
.
I can’t get consistent results using CoLab’s notebook, so I recommend running it locally.
In my computer I got the following profiling results (with 56 cpus):
Timer unit: 1e-06 s
Total time: 23.2731 s
File: <ipython-input-3-a3749fc5b867>
Function: parallel_with_bar at line 7
Line # Hits Time Per Hit % Time Line Contents
==============================================================
7 def parallel_with_bar(N, L):
8 1 941704.0 941704.0 4.0 ray.init(ignore_reinit_error=True, num_cpus=n_cpus)
9
10 1 10.0 10.0 0.0 def to_iterator(obj_ids):
11 while obj_ids:
12 done, obj_ids = ray.wait(obj_ids)
13 yield ray.get(done[0])
14
15 1 2229748.0 2229748.0 9.6 obj_ids = [test_remote.remote(L) for i in range(N)]
16 1 3.0 3.0 0.0 ret = []
17 5001 18927016.0 3784.6 81.3 for x in tqdm(to_iterator(obj_ids), total=len(obj_ids)):
18 5000 12861.0 2.6 0.1 ret.append(x)
19
20 1 1161728.0 1161728.0 5.0 ray.shutdown()
Timer unit: 1e-06 s
Total time: 6.6471 s
File: <ipython-input-3-a3749fc5b867>
Function: parallel_without_bar at line 23
Line # Hits Time Per Hit % Time Line Contents
==============================================================
23 def parallel_without_bar(N, L):
24 1 707324.0 707324.0 10.6 ray.init(ignore_reinit_error=True, num_cpus=n_cpus)
25 1 2472897.0 2472897.0 37.2 obj_ids = [test_remote.remote(L) for i in range(N)]
26 1 2646773.0 2646773.0 39.8 ret = ray.get(obj_ids)
27
28 1 820106.0 820106.0 12.3 ray.shutdown()
Timer unit: 1e-06 s
Total time: 70.925 s
File: <ipython-input-3-a3749fc5b867>
Function: single_with_bar at line 31
Line # Hits Time Per Hit % Time Line Contents
==============================================================
31 def single_with_bar(N, L):
32 1 70924965.0 70924965.0 100.0 ret = [test(L) for i in tqdm(range(N))]
Thanks for yout help! And amazing work!
If we cannot run your script, we cannot fix your issue.
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:13 (7 by maintainers)
Top GitHub Comments
@alexmascension I will take a look at it this weekends. Would you mind pinging me one more time if I don’t get back by next Monday?
sorry I got caught up on a deadline - @rkooo567 can you take a crack?