Not all Pandas dataframes are shared in a multiprocessing list
See original GitHub issueHello,
I’ve tried to get answer for this question on StackOverflow first, but I hope some of you can explain this and hopefully lead us to a solution.
The StackOverflow question is here: https://stackoverflow.com/questions/49942878/not-all-pandas-dataframes-are-shared-in-a-multiprocessing-list
I’ve also added an error callback and managed to get an error:
RemoteError('Traceback (most recent call last):
File “<removed>lib\multiprocessing\managers.py”, line 228, in serve_client
request = recv()
File “<removed>lib\multiprocessing\connection.py”, line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can’t get attribute ‘DataFrame’ on <module ‘pandas.core.frame’ from ‘<removed>lib\site-packages\pandas\core\frame.py>’
I’ve looked into the GitHub tracker and I found this issue that looks a lot like mine: https://github.com/pandas-dev/pandas/issues/2440 Although there are a few differences:
- I’m using multiprocessing instead of threading. Because of this, we can use a multiprocessing.Pool and and a special list object to share objects.
- In our example, we don’t actually change the dataframe in the different processes. We’re only adding it to the list of shared objects.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.19.2 (I’ve also tested this with pandas version 0.22.0, which I believe was the latests) nose: 1.3.7 pip: 10.0.0 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.13.1 scipy: 1.0.1 statsmodels: 0.8.0 xarray: None IPython: 6.2.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 matplotlib: 2.1.1 openpyxl: 2.4.9 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 httplib2: None apiclient: None sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 boto: 2.48.0 pandas_datareader: None
If you need anything else, let me know. We appreciate all the work you’ve done!
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:6 (2 by maintainers)

Top Related StackOverflow Question
@jreback did you see the full reproducible example on StackOverflow? (not that I know based on that what can be going on, but at least there is some detail to the question. And if there is still not enough detail, please request clarification or changes to the reproducible example)
Hi @freezas, yes it’s better if you check if what I did makes sens.
I added this to my_function.py:
In multiprocessing_example.py I then set processes_count to 19:
My pleasure.