question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallel job takes more time than non-parallel job?

See original GitHub issue

It seems that my batches are executed one by one rather than been parallel? I’m using iPython. It’s 32 core machine.

import pandas, numpy, hashlib
from joblib import Parallel, delayed

d = pandas.DataFrame({'a': numpy.random.randn(1000 * 1000)})

def hash(x):
    return hashlib.sha256(x.encode('utf-8')).hexdigest()

The output:


In [35]: Parallel(n_jobs=-1, batch_size=100 * 1000, verbose=20) (delayed(hash) (str(row[1:])) for row in (d.itertuples()))
[Parallel(n_jobs=-1)]: Done  32 tasks      | elapsed:    0.0s
[Parallel(n_jobs=-1)]: Done 100032 tasks      | elapsed:    6.8s
[Parallel(n_jobs=-1)]: Done 200032 tasks      | elapsed:   13.5s
[Parallel(n_jobs=-1)]: Done 300032 tasks      | elapsed:   20.1s
[Parallel(n_jobs=-1)]: Done 400032 tasks      | elapsed:   26.9s
[Parallel(n_jobs=-1)]: Done 500032 tasks      | elapsed:   33.9s
[Parallel(n_jobs=-1)]: Done 600032 tasks      | elapsed:   40.8s
[Parallel(n_jobs=-1)]: Done 700032 tasks      | elapsed:   47.5s
[Parallel(n_jobs=-1)]: Done 800032 tasks      | elapsed:   54.2s

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
chengguangnancommented, Mar 6, 2016

It worked. Thanks. The execution time is 2 seconds rather than 12 seconds.

    chunks = []
    for i in range(1, len(d), 1000):
       chunks.append(d[i:min(i + 1000, len(d))])

    res = Parallel(n_jobs=-1) (delayed(hash) (d) for d in chunks)

    def hash(d):
        return [hashlib.sha256(str(x).encode('utf-8')).hexdigest()[:30] for x in d.itertuples(index=False, name=None)]
0reactions
Debabrataadakcommented, Mar 10, 2019

from math import sqrt from joblib import Parallel, delayed from numpy import square import numpy as np import time

n=10000000 def sim(x):

time.sleep(100)

return x**2

chunk=[] arr=np.arange(n) for i in range(0,len(arr),100): chunk.append(arr[i:(i+100)])

if name == “main”:

result = Parallel(n_jobs=1, backend="threading", verbose=5) \
    (delayed(square)(x) for x in chunk)

#print result

I did the same. But the time taken by a single core is less than using 8 cores.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Parallel execution takes more time than the non- ...
Issue - Parallel execution(with number of parallel execution 2) took 338 milli seconds where as non parallel execution took 49 seconds to get...
Read more >
joblib - the parallel code takes more time than the non- ...
I am first time using joblib. I am using jupyter notebook on windows. it is 16 core machine. It seems that my code...
Read more >
When does too much parallelism affect performance?
It appears the non-parallel execution had a shorter elapsed time of 128 second compared to the parallel plan taking 148 seconds. The non- ......
Read more >
Why my tests are slower when I run more parallel CI nodes ...
Let's say you run 10 parallel jobs (parallel CI nodes) on your CI server. Your slowest test file spec/my_slow_spec.rb takes 2 minutes to...
Read more >
Performance in Parallel query - Ask TOM
Putting 10 programmers to work on a subroutine might take longer then letting 1 good ... I have hardly obtained better performance than...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found