question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pickling error when doing using dataframe with multiprocessing

See original GitHub issue

When using python 3.4.3 or 3.5.0 and pandas 0.16.2, this simple example works

import pandas as pd
import multiprocessing as mp

def squareIt(row):
    return row[0]**2, row[1]**2

df = pd.DataFrame({'a': range(100), 'b': range(100)})

with mp.Pool(2) as pool:
    sq = pool.imap(squareIt, df.itertuples(False), chunksize=10)
    for x in sq:
        print(x)

when upgrading to 0.17.1 there is this error.

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/pool.py", line 295, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/pool.py", line 689, in next
    raise value
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/pool.py", line 383, in _handle_tasks
    put(task)
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/connection.py", line 206, in send
    self._send_bytes(ForkingPickler.dumps(obj))
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'pandas.core.frame.Pandas'>: attribute lookup Pandas on pandas.core.frame failed

Is there a better way to do some simple multi-threaded processing of a dataframe. This had worked very well for me in the past. I would rather not rewrite all of my analysis, but if there is a better way I could be convinced.

Thanks.

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
shoyercommented, Dec 8, 2015

You can use df.itertuples(name=None) to make these regular rather than named tuples.

This does seem to be an unfortunate downside of switching itertuples to return named tuples.

1reaction
max-sixtycommented, Dec 7, 2015

I believe that’s trying to pickle a namedtuple. You can either use a different pickler (dill works great), or use a different method than itertuples - you could just send each Series?

Note that pickling is fairly expensive, and so to make this worthwhile you’re going to need to be doing a lot of compute on each piece of data. That example is going to be much faster in a single process.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PicklingError when using multiprocessing - Stack Overflow
The problem here is less of the "pickle" error message than conceptual: multiprocess does fork your code in "worker" different processes in order...
Read more >
Multiprocessing and Pickle, How to Easily fix that?
Steps to pickle ... 2. Save the file and run it through python process.py in the terminal. The test_pickle.pkl supposed to appear on...
Read more >
pickle error in multiprocssing - pydata
When define the multiprocessing funtion inside the class , I got the error like Can't pickle when using multiprocessing Pool.map() .
Read more >
Multiprocessing.Pool() - Stuck in a Pickle
Intro. This post sheds light on a common pitfall of the Python multiprocessing module: spending too much time serializing and deserializing data ...
Read more >
Python multiprocessing PicklingError on code taken directly ...
From the attached output, I believe that the issue you are experiencing is due to pickle 's difficulty to serialize interactive functions. Indirectly,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found