pickling error when doing using dataframe with multiprocessing
See original GitHub issueWhen using python 3.4.3 or 3.5.0 and pandas 0.16.2, this simple example works
import pandas as pd
import multiprocessing as mp
def squareIt(row):
return row[0]**2, row[1]**2
df = pd.DataFrame({'a': range(100), 'b': range(100)})
with mp.Pool(2) as pool:
sq = pool.imap(squareIt, df.itertuples(False), chunksize=10)
for x in sq:
print(x)
when upgrading to 0.17.1 there is this error.
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/pool.py", line 295, in <genexpr>
return (item for chunk in result for item in chunk)
File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/pool.py", line 689, in next
raise value
File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/pool.py", line 383, in _handle_tasks
put(task)
File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/opt/Python-3.4.3/lib/python3.4/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'pandas.core.frame.Pandas'>: attribute lookup Pandas on pandas.core.frame failed
Is there a better way to do some simple multi-threaded processing of a dataframe. This had worked very well for me in the past. I would rather not rewrite all of my analysis, but if there is a better way I could be convinced.
Thanks.
Issue Analytics
- State:
- Created 8 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
PicklingError when using multiprocessing - Stack Overflow
The problem here is less of the "pickle" error message than conceptual: multiprocess does fork your code in "worker" different processes in order...
Read more >Multiprocessing and Pickle, How to Easily fix that?
Steps to pickle ... 2. Save the file and run it through python process.py in the terminal. The test_pickle.pkl supposed to appear on...
Read more >pickle error in multiprocssing - pydata
When define the multiprocessing funtion inside the class , I got the error like Can't pickle when using multiprocessing Pool.map() .
Read more >Multiprocessing.Pool() - Stuck in a Pickle
Intro. This post sheds light on a common pitfall of the Python multiprocessing module: spending too much time serializing and deserializing data ...
Read more >Python multiprocessing PicklingError on code taken directly ...
From the attached output, I believe that the issue you are experiencing is due to pickle 's difficulty to serialize interactive functions. Indirectly,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
You can use
df.itertuples(name=None)
to make these regular rather than named tuples.This does seem to be an unfortunate downside of switching
itertuples
to return named tuples.I believe that’s trying to pickle a namedtuple. You can either use a different pickler (dill works great), or use a different method than
itertuples
- you could just send eachSeries
?Note that pickling is fairly expensive, and so to make this worthwhile you’re going to need to be doing a lot of compute on each piece of data. That example is going to be much faster in a single process.