Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pickling error when doing using dataframe with multiprocessing

See original GitHub issue

When using python 3.4.3 or 3.5.0 and pandas 0.16.2, this simple example works

import pandas as pd
import multiprocessing as mp

def squareIt(row):
    return row[0]**2, row[1]**2

df = pd.DataFrame({'a': range(100), 'b': range(100)})

with mp.Pool(2) as pool:
    sq = pool.imap(squareIt, df.itertuples(False), chunksize=10)
    for x in sq:
        print(x)

when upgrading to 0.17.1 there is this error.

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/pool.py", line 295, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/pool.py", line 689, in next
    raise value
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/pool.py", line 383, in _handle_tasks
    put(task)
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/connection.py", line 206, in send
    self._send_bytes(ForkingPickler.dumps(obj))
  File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'pandas.core.frame.Pandas'>: attribute lookup Pandas on pandas.core.frame failed

Is there a better way to do some simple multi-threaded processing of a dataframe. This had worked very well for me in the past. I would rather not rewrite all of my analysis, but if there is a better way I could be convinced.

Thanks.

Issue Analytics

State:
Created 8 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

2reactions

shoyercommented, Dec 8, 2015

You can use df.itertuples(name=None) to make these regular rather than named tuples.

This does seem to be an unfortunate downside of switching itertuples to return named tuples.

1reaction

max-sixtycommented, Dec 7, 2015

I believe that’s trying to pickle a namedtuple. You can either use a different pickler (dill works great), or use a different method than itertuples - you could just send each Series?

Note that pickling is fairly expensive, and so to make this worthwhile you’re going to need to be doing a lot of compute on each piece of data. That example is going to be much faster in a single process.

Top Results From Across the Web

PicklingError when using multiprocessing - Stack Overflow

The problem here is less of the "pickle" error message than conceptual: multiprocess does fork your code in "worker" different processes in order...

Multiprocessing and Pickle, How to Easily fix that?

Steps to pickle ... 2. Save the file and run it through python process.py in the terminal. The test_pickle.pkl supposed to appear on...

pickle error in multiprocssing - pydata

When define the multiprocessing funtion inside the class , I got the error like Can't pickle when using multiprocessing Pool.map() .

Multiprocessing.Pool() - Stuck in a Pickle

Intro. This post sheds light on a common pitfall of the Python multiprocessing module: spending too much time serializing and deserializing data ...

Python multiprocessing PicklingError on code taken directly ...

From the attached output, I believe that the issue you are experiencing is due to pickle 's difficulty to serialize interactive functions. Indirectly,...