Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parallel_apply results in EOFError when run from Pycharm, works fine from Jupyter Notebook

See original GitHub issue

I was trying to parallelise my code with pandarallel package in the following way:

import pandas as pd
from sklearn.cluster import SpectralClustering
from pandarallel import pandarallel
import numpy as np

ex = {'measurement_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1}, 'time': {0: 30000, 1: 30000, 2: 30000, 3: 30000, 4: 30000, 5: 30000, 6: 30000, 7: 30000, 8: 30000, 9: 30000, 10: 30100, 11: 30100, 12: 30100, 13: 30100, 14: 30100, 15: 30100, 16: 30100, 17: 30100, 18: 30100, 19: 30100}, 'group': {0: '0', 1: '0', 2: '0', 3: '0', 4: '0', 5: '0', 6: '0', 7: '0', 8: '0', 9: '0', 10: '0', 11: '0', 12: '0', 13: '0', 14: '0', 15: '0', 16: '0', 17: '0', 18: '0', 19: '0'}, 'object': {0: 'obj1', 1: 'obj10', 2: 'obj2', 3: 'obj3', 4: 'obj4', 5: 'obj5', 6: 'obj6', 7: 'obj7', 8: 'obj8', 9: 'obj9', 10: 'obj1', 11: 'obj10', 12: 'obj2', 13: 'obj3', 14: 'obj4', 15: 'obj5', 16: 'obj6', 17: 'obj7', 18: 'obj8', 19: 'obj9'}, 'x': {0: 55.507999420166016, 1: 49.67399978637695, 2: 61.9640007019043, 3: 67.98300170898438, 4: 49.43199920654297, 5: 40.34000015258789, 6: 69.50399780273438, 7: 49.65800094604492, 8: 68.48200225830078, 9: 37.87900161743164, 10: 55.595001220703125, 11: 49.52399826049805, 12: 61.92499923706055, 13: 67.91799926757812, 14: 49.30099868774414, 15: 40.141998291015625, 16: 69.49299621582031, 17: 49.775001525878906, 18: 68.4010009765625, 19: 37.77899932861328}}

ex = pd.DataFrame.from_dict(ex).set_index(['measurement_id', 'time', 'group'])
    
def cluster(x, index):
    x = np.asarray(x)[:, np.newaxis]
    
    clustering = SpectralClustering(n_clusters = 3, random_state = 42, gamma = 1 / 50).fit(x)
    return pd.Series(clustering.labels_ + 1, index = index)
    
pandarallel.initialize(nb_workers=2, progress_bar=True)
ex \
    .groupby(['measurement_id', 'time', 'group']) \
    .parallel_apply(lambda x: cluster(x['x'], x['object']))

However, when I’m running this on Pycharm I get the following error:

  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-84-7c89aedcfad4>", line 13, in <module>
    .parallel_apply(lambda x: cluster(x['x'], x['object']))
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 451, in closure
    map_result,
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 358, in get_workers_result
    message_type, message = queue.get()
  File "<string>", line 2, in get
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod
    kind, result = conn.recv()
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

I thought that this is maybe due to some incompatibility with the latest pandas or python release and tried to recreate the issue with different environment on Jupyter Notebook. It worked well so I tested the same environment on Jupyter notebook - it worked fine. I made sure that I’m running the same environment with

import sys
print(sys.executable)

and this is indeed a case. So the only difference seems to that I use PyCharm instead of Jupyter Notebook. My environment is set up with Python 3.7.6 and pandas 1.0.1.

Issue Analytics

State:
Created 4 years ago
Reactions:12
Comments:18

Top GitHub Comments

4reactions

schillingalexcommented, Dec 10, 2021

Deactivating “Run with Python Console” in the run configuration solved the problem for me.

4reactions

moritzwilkschcommented, Jun 24, 2020

Same for me in PyCharm. Havn’t tried a different IDE yet. However, NOT using the memory file system by setting use_memory_fs=False in the initialize call seems to work.

Top Results From Across the Web

Why do I get "Pickle - EOFError: Ran out of input" reading an ...

I am getting an interesting error while trying to use Unpickler.load() , here is the source code ...

Entirely stops working when Jupyter Notebook is opened 2021.3

To reproduce: Open a Jupyter Notebook with the vim plugin on and another editor open (optional) and you ... Yes, IntelliJ within regular...

Jupyter Server Throws Exception When Starting : PY-51656

When trying to run a cell, the default managed jupyter server tries to start, but throws the exception shown in the attached screen...

Jupyter Tool Window does not work with Pycharm 2020.1.2 ...

What is the expected result? Variables show up in Jupyter tool window. What happens instead? Python notebook executes just fine, and outputs are...

Managed Jupyter Server fails to start in 2021.2 - YouTrack

Attemp to run cells (either individual cells or the whole notebook). Expected Result. Up until recently, these actions would cause PyCharm to start...