question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using threads taking longer than dummy

See original GitHub issue

This issue looked like it might shed light on my issue, but it seems I’m missing something more fundamental. When I run this script below to simply iterate through the BatchedDataLoader, using the thread option actually takes longer than using the main worker via dummy and no matter what, it seems that more workers results in a longer processing time. I’m running this on a machine with 16 vCPU and 64 Gib Memory. The memory stays stable through the process, but I notice CPU usage spikes throughout these iterations. It seems like no matter how long an iteration is taking though, using multiple workers should take less time overall, right? The total size of this dataset I’m testing on is small, around ~1.4 GiB, saved in parquet format in s3, ~1.5M rows and I’m only running 100 iterations with batch size 100. Maybe the script is not working as intended?

Script:

pools = ['dummy', 'thread']
workers = [1, 4, 8]
for i in product(pools, workers):
    print(i)

# the trsfm_spec is doing a few different transformations on various columns given the pandas dataframe
for pool_type, workers_count in product(pools, workers):
    with make_batch_reader(s3_path, 
                               workers_count=workers_count,
                               transform_spec=trsfm_spec,
                               schema_fields=cols,
                               num_epochs=10,
                               reader_pool_type = pool_type) as reader:
        
        loader = BatchedDataLoader(reader, batch_size=100)
        loader_iter = iter(loader)
        
        time_sum = 0
        batch_size_sum = 0
        batches = 100
        print('--')
        print(f'workers_count: {workers_count} and pool_type: {pool_type}')
        loop_start = time.time()
        for batch_idx in range(batches):
            start = time.time()
            batch = next(loader_iter)
            end = time.time()
            
            t = end - start       
            time_sum += t
            batch_size = sys.getsizeof(batch)
            if not batch_idx in [0,1]:
                batch_size_sum += batch_size
        loop_end = time.time()
        print(f'time sum: {time_sum}')
        print(f'average time to process batch: {time_sum / (batches-2)}')
        print(f'loop time: {loop_end-loop_start}')
        print(f'average batch size: {batch_size_sum / batch_idx}')

Output:

('dummy', 1)
('dummy', 4)
('dummy', 8)
('thread', 1)
('thread', 4)
('thread', 8)
--
workers_count: 1 and pool_type: dummy
time sum: 2.927597761154175
average time to process batch: 0.029873446542389537
loop time: 2.927725315093994
average batch size: 1172.040404040404
--
workers_count: 4 and pool_type: dummy
time sum: 4.17024040222168
average time to process batch: 0.04255347349205796
loop time: 4.1703784465789795
average batch size: 1172.040404040404
--
workers_count: 8 and pool_type: dummy
time sum: 4.148790121078491
average time to process batch: 0.042334593072229504
loop time: 4.1489198207855225
average batch size: 1172.040404040404
--
workers_count: 1 and pool_type: thread
time sum: 4.524890422821045
average time to process batch: 0.04617235125327597
loop time: 4.525023937225342
average batch size: 1172.040404040404
--
workers_count: 4 and pool_type: thread
time sum: 8.192336320877075
average time to process batch: 0.08359526858037832
loop time: 8.192479372024536
average batch size: 1172.040404040404
--
workers_count: 8 and pool_type: thread
time sum: 11.62452483177185
average time to process batch: 0.11861760032420256
loop time: 11.624683380126953
average batch size: 1172.040404040404

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
selitvincommented, Mar 4, 2021

Do you think having an option for a user to supply their own collate function would be helpful in your case? This is something that was brought up in #647 and I will try to address in the following weeks.

0reactions
selitvincommented, Mar 4, 2021

Agreed. We’ll try moving in this direction. Thank you for your input!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does a dummy Thread (only sleeping in short intervals ...
Basically, when we use a short delay like 5 ms we get a better performance within the library. When we use longer delays...
Read more >
Multithreading VS Multiprocessing in Python - Medium
In this article, I will try to discuss some misconceptions about Multithreading and explain why they are false.
Read more >
A Practical Guide to Multithreading in Python - Level Up Coding
Learn how to speed up your program with threads pools in Python. A dummy image for better reading and navigation. Photo by Christin...
Read more >
Threads maxing out all cores, but no performance increase
Any advice on how to use multi-threading and actually speed up the code ... until completion is longer when on 8 cores than...
Read more >
A gentle introduction to multithreading - Internal Pointers
Let's rethink your app in a multithreaded way. Thread A is responsible for the disk access, while thread B takes care of the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found