Serialization error on Ray 2.0rc with pandas DataFrames
See original GitHub issueWhat is the problem?
Ray version and other system information (Python version, TensorFlow version, OS): 2.0rc
Serialization fails on putting a pandas DataFrame in Client mode.
Logs:
Got Error from data channel -- shutting down: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Exception iterating responses: 'DataFrame' object has no attribute '_data'"
debug_error_string = "{"created":"@1612307207.723737628","description":"Error received from peer ipv4:52.43.158.36:51005","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Exception iterating responses: 'DataFrame' object has no attribute '_data'","grpc_status":2}"
>
Exception in thread Thread-8:
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/dataclient.py", line 87, in _data_main
raise e
File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/dataclient.py", line 62, in _data_main
for response in resp_stream:
File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/grpc/_channel.py", line 416, in __next__
return self._next()
File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/grpc/_channel.py", line 803, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Exception iterating responses: 'DataFrame' object has no attribute '_data'"
debug_error_string = "{"created":"@1612307207.723737628","description":"Error received from peer ipv4:52.43.158.36:51005","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Exception iterating responses: 'DataFrame' object has no attribute '_data'","grpc_status":2}"
>
---------------------------------------------------------------------------
ConnectionError Traceback (most recent call last)
<ipython-input-4-db5f700c182c> in <module>
6
7
----> 8 x = ray.put(pandas.DataFrame(np.random.randint(0, 100, size=(2**4, 2**4))))
9
10
~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/_private/client_mode_hook.py in wrapper(*args, **kwargs)
44 global _client_hook_enabled
45 if client_mode_enabled and _client_hook_enabled:
---> 46 return getattr(ray, func.__name__)(*args, **kwargs)
47 return func(*args, **kwargs)
48
~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/api.py in put(self, *args, **kwargs)
41 kwargs: opaque keyword arguments
42 """
---> 43 return self.worker.put(*args, **kwargs)
44
45 def wait(self, *args, **kwargs):
~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/worker.py in put(self, vals)
191 to_put.append(vals)
192
--> 193 out = [self._put(x) for x in to_put]
194 if single:
195 out = out[0]
~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/worker.py in <listcomp>(.0)
191 to_put.append(vals)
192
--> 193 out = [self._put(x) for x in to_put]
194 if single:
195 out = out[0]
~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/worker.py in _put(self, val)
206 data = dumps_from_client(val, self._client_id)
207 req = ray_client_pb2.PutRequest(data=data)
--> 208 resp = self.data_client.PutObject(req)
209 return ClientObjectRef(resp.id)
210
~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/dataclient.py in PutObject(self, request, context)
125 context=None) -> ray_client_pb2.PutResponse:
126 datareq = ray_client_pb2.DataRequest(put=request, )
--> 127 resp = self._blocking_send(datareq)
128 return resp.put
129
~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/dataclient.py in _blocking_send(self, req)
104 if self._in_shutdown:
105 raise ConnectionError(
--> 106 f"cannot send request {req}: data channel shutting down")
107 data = self.ready_data[req_id]
108 del self.ready_data[req_id]
ConnectionError: cannot send request req_id: 2
put {
Originally reported here: https://discuss.modin.org/t/unable-to-connect-to-external-ray-cluster/175/9
Related to: https://discuss.ray.io/t/error-in-rpc-call-in-client-mode/703
Reproduction (REQUIRED)
import ray
import ray.util
ray.util.connect('<HOST>:<PORT>')
import modin.pandas as pd
import pandas
import numpy as np
x = ray.put(pandas.DataFrame(np.random.randint(0, 100, size=(2**4, 2**4)))
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
Error in RPC in client mode - Monitoring & Debugging - Ray
From the error message, it seems to me that the pandas' DataFrame is not getting serialized properly. Please suggest.
Read more >Unable to connect to external Ray cluster - Errors/Issues
I set up a Ray v2.0.0 cluster in EC2 and made sure that I can invoke remote calls from an external host. Then...
Read more >Brian 2 Documentation - Read the Docs
If this fails with an error message about the py-cpuinfo package (a ... state variables of a group or a full network to/from...
Read more >List of supported software - EasyBuild documentation
ACT is a Java application for displaying pairwise comparisons between two or more DNA sequences. It can be used to identify and analyse...
Read more >Awesome Stars - Source for https://arbal.github.io
Please visit our web site to file bug reports or submit patches. eradman/entr - Run ... Simulate radio propagation in inhomogeneous by media...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ok, so it seems like the best fix will be a documentation page and maybe slightly better error messages.
Should we provide a Modin example on the Ray Client documentation page for how to install dependencies?
Closing this for now since it should be largely resolved