parallel_apply results in EOFError when run from Pycharm, works fine from Jupyter Notebook
See original GitHub issueI was trying to parallelise my code with pandarallel
package in the following way:
import pandas as pd
from sklearn.cluster import SpectralClustering
from pandarallel import pandarallel
import numpy as np
ex = {'measurement_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1}, 'time': {0: 30000, 1: 30000, 2: 30000, 3: 30000, 4: 30000, 5: 30000, 6: 30000, 7: 30000, 8: 30000, 9: 30000, 10: 30100, 11: 30100, 12: 30100, 13: 30100, 14: 30100, 15: 30100, 16: 30100, 17: 30100, 18: 30100, 19: 30100}, 'group': {0: '0', 1: '0', 2: '0', 3: '0', 4: '0', 5: '0', 6: '0', 7: '0', 8: '0', 9: '0', 10: '0', 11: '0', 12: '0', 13: '0', 14: '0', 15: '0', 16: '0', 17: '0', 18: '0', 19: '0'}, 'object': {0: 'obj1', 1: 'obj10', 2: 'obj2', 3: 'obj3', 4: 'obj4', 5: 'obj5', 6: 'obj6', 7: 'obj7', 8: 'obj8', 9: 'obj9', 10: 'obj1', 11: 'obj10', 12: 'obj2', 13: 'obj3', 14: 'obj4', 15: 'obj5', 16: 'obj6', 17: 'obj7', 18: 'obj8', 19: 'obj9'}, 'x': {0: 55.507999420166016, 1: 49.67399978637695, 2: 61.9640007019043, 3: 67.98300170898438, 4: 49.43199920654297, 5: 40.34000015258789, 6: 69.50399780273438, 7: 49.65800094604492, 8: 68.48200225830078, 9: 37.87900161743164, 10: 55.595001220703125, 11: 49.52399826049805, 12: 61.92499923706055, 13: 67.91799926757812, 14: 49.30099868774414, 15: 40.141998291015625, 16: 69.49299621582031, 17: 49.775001525878906, 18: 68.4010009765625, 19: 37.77899932861328}}
ex = pd.DataFrame.from_dict(ex).set_index(['measurement_id', 'time', 'group'])
def cluster(x, index):
x = np.asarray(x)[:, np.newaxis]
clustering = SpectralClustering(n_clusters = 3, random_state = 42, gamma = 1 / 50).fit(x)
return pd.Series(clustering.labels_ + 1, index = index)
pandarallel.initialize(nb_workers=2, progress_bar=True)
ex \
.groupby(['measurement_id', 'time', 'group']) \
.parallel_apply(lambda x: cluster(x['x'], x['object']))
However, when I’m running this on Pycharm I get the following error:
File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-84-7c89aedcfad4>", line 13, in <module> .parallel_apply(lambda x: cluster(x['x'], x['object'])) File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 451, in closure map_result, File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 358, in get_workers_result message_type, message = queue.get() File "<string>", line 2, in get File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod kind, result = conn.recv() File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError
I thought that this is maybe due to some incompatibility with the latest pandas or python release and tried to recreate the issue with different environment on Jupyter Notebook. It worked well so I tested the same environment on Jupyter notebook - it worked fine. I made sure that I’m running the same environment with
import sys
print(sys.executable)
and this is indeed a case. So the only difference seems to that I use PyCharm instead of Jupyter Notebook. My environment is set up with Python 3.7.6 and pandas 1.0.1.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:12
- Comments:18
Top Results From Across the Web
Why do I get "Pickle - EOFError: Ran out of input" reading an ...
I am getting an interesting error while trying to use Unpickler.load() , here is the source code ...
Read more >Entirely stops working when Jupyter Notebook is opened 2021.3
To reproduce: Open a Jupyter Notebook with the vim plugin on and another editor open (optional) and you ... Yes, IntelliJ within regular...
Read more >Jupyter Server Throws Exception When Starting : PY-51656
When trying to run a cell, the default managed jupyter server tries to start, but throws the exception shown in the attached screen...
Read more >Jupyter Tool Window does not work with Pycharm 2020.1.2 ...
What is the expected result? Variables show up in Jupyter tool window. What happens instead? Python notebook executes just fine, and outputs are...
Read more >Managed Jupyter Server fails to start in 2021.2 - YouTrack
Attemp to run cells (either individual cells or the whole notebook). Expected Result. Up until recently, these actions would cause PyCharm to start...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Deactivating “Run with Python Console” in the run configuration solved the problem for me.
Same for me in PyCharm. Havn’t tried a different IDE yet. However, NOT using the memory file system by setting
use_memory_fs=False
in theinitialize
call seems to work.