question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not all Pandas dataframes are shared in a multiprocessing list

See original GitHub issue

Hello,

I’ve tried to get answer for this question on StackOverflow first, but I hope some of you can explain this and hopefully lead us to a solution.

The StackOverflow question is here: https://stackoverflow.com/questions/49942878/not-all-pandas-dataframes-are-shared-in-a-multiprocessing-list

I’ve also added an error callback and managed to get an error:

RemoteError('Traceback (most recent call last): File “<removed>lib\multiprocessing\managers.py”, line 228, in serve_client request = recv() File “<removed>lib\multiprocessing\connection.py”, line 251, in recv return _ForkingPickler.loads(buf.getbuffer()) AttributeError: Can’t get attribute ‘DataFrame’ on <module ‘pandas.core.frame’ from ‘<removed>lib\site-packages\pandas\core\frame.py>’

I’ve looked into the GitHub tracker and I found this issue that looks a lot like mine: https://github.com/pandas-dev/pandas/issues/2440 Although there are a few differences:

  • I’m using multiprocessing instead of threading. Because of this, we can use a multiprocessing.Pool and and a special list object to share objects.
  • In our example, we don’t actually change the dataframe in the different processes. We’re only adding it to the list of shared objects.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.2 (I’ve also tested this with pandas version 0.22.0, which I believe was the latests) nose: 1.3.7 pip: 10.0.0 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.13.1 scipy: 1.0.1 statsmodels: 0.8.0 xarray: None IPython: 6.2.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 matplotlib: 2.1.1 openpyxl: 2.4.9 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 httplib2: None apiclient: None sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 boto: 2.48.0 pandas_datareader: None

If you need anything else, let me know. We appreciate all the work you’ve done!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
jorisvandenbosschecommented, Apr 23, 2018

@jreback did you see the full reproducible example on StackOverflow? (not that I know based on that what can be going on, but at least there is some detail to the question. And if there is still not enough detail, please request clarification or changes to the reproducible example)

1reaction
KhaledTocommented, Apr 24, 2018

Hi @freezas, yes it’s better if you check if what I did makes sens.

I added this to my_function.py:

def share_random_pandas_dataframe(shared_list):
    list_int = [1, 2, 3]
    shared_list.append(list_int)

In multiprocessing_example.py I then set processes_count to 19:

processes_count = 19

My pleasure.

Read more comments on GitHub >

github_iconTop Results From Across the Web

multiprocessing in python - sharing large object (e.g. pandas ...
This process receives calls from the other children with specific data requests (i.e. a row, a specific cell, a slice etc..) from your...
Read more >
Pandas and Multiprocessing: How to create dataframes in a ...
A simple and easy way to do this is to perform the following: Read the xls file; Create a dataframe; Append the dataframe...
Read more >
Processing Multiple Pandas Series in Parallel
I've been wanting a simple way to process Pandas DataFrames in parallel, and recently I found this truly awesome blog post.. It shows...
Read more >
Outputting the result of multiprocessing to a pandas dataframe
This tutorial demonstrates a straightforward workaround where you can return a list of lists from multiprocessing and then convert that to a ...
Read more >
Python Multithreading and Multiprocessing Tutorial - Toptal
This file will contain all the functions necessary to fetch the list of images and ... Parallelism and Concurrency in Python: Multithreading Example....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found