Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not all Pandas dataframes are shared in a multiprocessing list

See original GitHub issue

Hello,

I’ve tried to get answer for this question on StackOverflow first, but I hope some of you can explain this and hopefully lead us to a solution.

The StackOverflow question is here: https://stackoverflow.com/questions/49942878/not-all-pandas-dataframes-are-shared-in-a-multiprocessing-list

I’ve also added an error callback and managed to get an error:

RemoteError('Traceback (most recent call last): File “<removed>lib\multiprocessing\managers.py”, line 228, in serve_client request = recv() File “<removed>lib\multiprocessing\connection.py”, line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) AttributeError: Can’t get attribute ‘DataFrame’ on <module ‘pandas.core.frame’ from ‘<removed>lib\site-packages\pandas\core\frame.py>’

I’ve looked into the GitHub tracker and I found this issue that looks a lot like mine: https://github.com/pandas-dev/pandas/issues/2440 Although there are a few differences:

I’m using multiprocessing instead of threading. Because of this, we can use a multiprocessing.Pool and and a special list object to share objects.
In our example, we don’t actually change the dataframe in the different processes. We’re only adding it to the list of shared objects.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.2 (I’ve also tested this with pandas version 0.22.0, which I believe was the latests) nose: 1.3.7 pip: 10.0.0 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.13.1 scipy: 1.0.1 statsmodels: 0.8.0 xarray: None IPython: 6.2.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 matplotlib: 2.1.1 openpyxl: 2.4.9 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 httplib2: None apiclient: None sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 boto: 2.48.0 pandas_datareader: None

If you need anything else, let me know. We appreciate all the work you’ve done!

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:6 (2 by maintainers)

Top GitHub Comments

2reactions

jorisvandenbosschecommented, Apr 23, 2018

@jreback did you see the full reproducible example on StackOverflow? (not that I know based on that what can be going on, but at least there is some detail to the question. And if there is still not enough detail, please request clarification or changes to the reproducible example)

1reaction

KhaledTocommented, Apr 24, 2018

Hi @freezas, yes it’s better if you check if what I did makes sens.

I added this to my_function.py:

def share_random_pandas_dataframe(shared_list):
    list_int = [1, 2, 3]
    shared_list.append(list_int)

In multiprocessing_example.py I then set processes_count to 19:

processes_count = 19

My pleasure.

Top Results From Across the Web

multiprocessing in python - sharing large object (e.g. pandas ...

This process receives calls from the other children with specific data requests (i.e. a row, a specific cell, a slice etc..) from your...

Pandas and Multiprocessing: How to create dataframes in a ...

A simple and easy way to do this is to perform the following: Read the xls file; Create a dataframe; Append the dataframe...

Processing Multiple Pandas Series in Parallel

I've been wanting a simple way to process Pandas DataFrames in parallel, and recently I found this truly awesome blog post.. It shows...

Outputting the result of multiprocessing to a pandas dataframe

This tutorial demonstrates a straightforward workaround where you can return a list of lists from multiprocessing and then convert that to a ...

Python Multithreading and Multiprocessing Tutorial - Toptal

This file will contain all the functions necessary to fetch the list of images and ... Parallelism and Concurrency in Python: Multithreading Example....