question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parallel_apply results in EOFError when run from Pycharm, works fine from Jupyter Notebook

See original GitHub issue

I was trying to parallelise my code with pandarallel package in the following way:

import pandas as pd
from sklearn.cluster import SpectralClustering
from pandarallel import pandarallel
import numpy as np
ex = {'measurement_id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1}, 'time': {0: 30000, 1: 30000, 2: 30000, 3: 30000, 4: 30000, 5: 30000, 6: 30000, 7: 30000, 8: 30000, 9: 30000, 10: 30100, 11: 30100, 12: 30100, 13: 30100, 14: 30100, 15: 30100, 16: 30100, 17: 30100, 18: 30100, 19: 30100}, 'group': {0: '0', 1: '0', 2: '0', 3: '0', 4: '0', 5: '0', 6: '0', 7: '0', 8: '0', 9: '0', 10: '0', 11: '0', 12: '0', 13: '0', 14: '0', 15: '0', 16: '0', 17: '0', 18: '0', 19: '0'}, 'object': {0: 'obj1', 1: 'obj10', 2: 'obj2', 3: 'obj3', 4: 'obj4', 5: 'obj5', 6: 'obj6', 7: 'obj7', 8: 'obj8', 9: 'obj9', 10: 'obj1', 11: 'obj10', 12: 'obj2', 13: 'obj3', 14: 'obj4', 15: 'obj5', 16: 'obj6', 17: 'obj7', 18: 'obj8', 19: 'obj9'}, 'x': {0: 55.507999420166016, 1: 49.67399978637695, 2: 61.9640007019043, 3: 67.98300170898438, 4: 49.43199920654297, 5: 40.34000015258789, 6: 69.50399780273438, 7: 49.65800094604492, 8: 68.48200225830078, 9: 37.87900161743164, 10: 55.595001220703125, 11: 49.52399826049805, 12: 61.92499923706055, 13: 67.91799926757812, 14: 49.30099868774414, 15: 40.141998291015625, 16: 69.49299621582031, 17: 49.775001525878906, 18: 68.4010009765625, 19: 37.77899932861328}}

ex = pd.DataFrame.from_dict(ex).set_index(['measurement_id', 'time', 'group'])
    
def cluster(x, index):
    x = np.asarray(x)[:, np.newaxis]
    
    clustering = SpectralClustering(n_clusters = 3, random_state = 42, gamma = 1 / 50).fit(x)
    return pd.Series(clustering.labels_ + 1, index = index)
    
pandarallel.initialize(nb_workers=2, progress_bar=True)
ex \
    .groupby(['measurement_id', 'time', 'group']) \
    .parallel_apply(lambda x: cluster(x['x'], x['object']))

However, when I’m running this on Pycharm I get the following error:

  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-84-7c89aedcfad4>", line 13, in <module>
    .parallel_apply(lambda x: cluster(x['x'], x['object']))
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 451, in closure
    map_result,
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 358, in get_workers_result
    message_type, message = queue.get()
  File "<string>", line 2, in get
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod
    kind, result = conn.recv()
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/kuba/anaconda3/envs/test_env/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

I thought that this is maybe due to some incompatibility with the latest pandas or python release and tried to recreate the issue with different environment on Jupyter Notebook. It worked well so I tested the same environment on Jupyter notebook - it worked fine. I made sure that I’m running the same environment with

import sys
print(sys.executable)

and this is indeed a case. So the only difference seems to that I use PyCharm instead of Jupyter Notebook. My environment is set up with Python 3.7.6 and pandas 1.0.1.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:12
  • Comments:18

github_iconTop GitHub Comments

4reactions
schillingalexcommented, Dec 10, 2021

Deactivating “Run with Python Console” in the run configuration solved the problem for me.

4reactions
moritzwilkschcommented, Jun 24, 2020

Same for me in PyCharm. Havn’t tried a different IDE yet. However, NOT using the memory file system by setting use_memory_fs=False in the initialize call seems to work.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why do I get "Pickle - EOFError: Ran out of input" reading an ...
I am getting an interesting error while trying to use Unpickler.load() , here is the source code ...
Read more >
Entirely stops working when Jupyter Notebook is opened 2021.3
To reproduce: Open a Jupyter Notebook with the vim plugin on and another editor open (optional) and you ... Yes, IntelliJ within regular...
Read more >
Jupyter Server Throws Exception When Starting : PY-51656
When trying to run a cell, the default managed jupyter server tries to start, but throws the exception shown in the attached screen...
Read more >
Jupyter Tool Window does not work with Pycharm 2020.1.2 ...
What is the expected result? Variables show up in Jupyter tool window. What happens instead? Python notebook executes just fine, and outputs are...
Read more >
Managed Jupyter Server fails to start in 2021.2 - YouTrack
Attemp to run cells (either individual cells or the whole notebook). Expected Result. Up until recently, these actions would cause PyCharm to start...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found