question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Serialization error on Ray 2.0rc with pandas DataFrames

See original GitHub issue

What is the problem?

Ray version and other system information (Python version, TensorFlow version, OS): 2.0rc

Serialization fails on putting a pandas DataFrame in Client mode.

Logs:

Got Error from data channel -- shutting down: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception iterating responses: 'DataFrame' object has no attribute '_data'"
	debug_error_string = "{"created":"@1612307207.723737628","description":"Error received from peer ipv4:52.43.158.36:51005","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Exception iterating responses: 'DataFrame' object has no attribute '_data'","grpc_status":2}"
>
Exception in thread Thread-8:
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/dataclient.py", line 87, in _data_main
    raise e
  File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/dataclient.py", line 62, in _data_main
    for response in resp_stream:
  File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/grpc/_channel.py", line 416, in __next__
    return self._next()
  File "/home/ec2-user/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/grpc/_channel.py", line 803, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception iterating responses: 'DataFrame' object has no attribute '_data'"
	debug_error_string = "{"created":"@1612307207.723737628","description":"Error received from peer ipv4:52.43.158.36:51005","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Exception iterating responses: 'DataFrame' object has no attribute '_data'","grpc_status":2}"
>

---------------------------------------------------------------------------
ConnectionError                           Traceback (most recent call last)
<ipython-input-4-db5f700c182c> in <module>
      6 
      7 
----> 8 x = ray.put(pandas.DataFrame(np.random.randint(0, 100, size=(2**4, 2**4))))
      9 
     10 

~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/_private/client_mode_hook.py in wrapper(*args, **kwargs)
     44         global _client_hook_enabled
     45         if client_mode_enabled and _client_hook_enabled:
---> 46             return getattr(ray, func.__name__)(*args, **kwargs)
     47         return func(*args, **kwargs)
     48 

~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/api.py in put(self, *args, **kwargs)
     41             kwargs: opaque keyword arguments
     42         """
---> 43         return self.worker.put(*args, **kwargs)
     44 
     45     def wait(self, *args, **kwargs):

~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/worker.py in put(self, vals)
    191             to_put.append(vals)
    192 
--> 193         out = [self._put(x) for x in to_put]
    194         if single:
    195             out = out[0]

~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/worker.py in <listcomp>(.0)
    191             to_put.append(vals)
    192 
--> 193         out = [self._put(x) for x in to_put]
    194         if single:
    195             out = out[0]

~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/worker.py in _put(self, val)
    206         data = dumps_from_client(val, self._client_id)
    207         req = ray_client_pb2.PutRequest(data=data)
--> 208         resp = self.data_client.PutObject(req)
    209         return ClientObjectRef(resp.id)
    210 

~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/dataclient.py in PutObject(self, request, context)
    125                   context=None) -> ray_client_pb2.PutResponse:
    126         datareq = ray_client_pb2.DataRequest(put=request, )
--> 127         resp = self._blocking_send(datareq)
    128         return resp.put
    129 

~/anaconda3/envs/ray-env-v200/lib/python3.7/site-packages/ray/util/client/dataclient.py in _blocking_send(self, req)
    104             if self._in_shutdown:
    105                 raise ConnectionError(
--> 106                     f"cannot send request {req}: data channel shutting down")
    107             data = self.ready_data[req_id]
    108             del self.ready_data[req_id]

ConnectionError: cannot send request req_id: 2
put {

Originally reported here: https://discuss.modin.org/t/unable-to-connect-to-external-ray-cluster/175/9

Related to: https://discuss.ray.io/t/error-in-rpc-call-in-client-mode/703

Reproduction (REQUIRED)

import ray
import ray.util
ray.util.connect('<HOST>:<PORT>')

import modin.pandas as pd
import pandas
import numpy as np

x = ray.put(pandas.DataFrame(np.random.randint(0, 100, size=(2**4, 2**4)))
  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
devin-petersohncommented, Feb 4, 2021

Ok, so it seems like the best fix will be a documentation page and maybe slightly better error messages.

Should we provide a Modin example on the Ray Client documentation page for how to install dependencies?

0reactions
richardliawcommented, Apr 21, 2021

Closing this for now since it should be largely resolved

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error in RPC in client mode - Monitoring & Debugging - Ray
From the error message, it seems to me that the pandas' DataFrame is not getting serialized properly. Please suggest.
Read more >
Unable to connect to external Ray cluster - Errors/Issues
I set up a Ray v2.0.0 cluster in EC2 and made sure that I can invoke remote calls from an external host. Then...
Read more >
Brian 2 Documentation - Read the Docs
If this fails with an error message about the py-cpuinfo package (a ... state variables of a group or a full network to/from...
Read more >
List of supported software - EasyBuild documentation
ACT is a Java application for displaying pairwise comparisons between two or more DNA sequences. It can be used to identify and analyse...
Read more >
Awesome Stars - Source for https://arbal.github.io
Please visit our web site to file bug reports or submit patches. eradman/entr - Run ... Simulate radio propagation in inhomogeneous by media...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found