Cannot initialise superclass from actor process
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro Linux 18.1.2
- Ray installed from (source or binary): binary, pip install
- Ray version: 0.7.6
- Python version: 3.7.4
Describe the problem
I am trying to parallelise a sampling process, so I created a Sampler object. The Sampler depends on two datasets, which are large (stored as numpy arrays), which are arguments to the constructor. To avoid having duplicates in the object store, my idea has been to first use ray.put
to add the object to the object store and then initialise the Sampler objects with the corresponding ids.
Moreover, I don’t want to add the decorators to the Sampler class. Instead, I created a subclass of Sampler, RemoteSampler which decorates the methods of the superclass and modifies them by adding the .remote() call. However, I seem to be unable to initialise the superclass from the ActorClass. I get a type error:
TypeError: super() argument 1 must be type, not ActorClass(RemoteSampler)
.
Source code / logs
Please see the skeleton code below:
class Sampler(object):
def __init__(self, train_data, d_train_data, *others):
# these can be big, so we want to have only one copy that
# mutliple actors share
if isinstance(train_data, np.ndarray):
self.train_data = train_data
else:
self.train_data = ray.get(train_data)
if isinstance(d_train_data, np.ndarray):
self.d_train_data = d_train_data
else:
self.d_train_data = ray.get(d_train_data)
# Initialise the rest of the sampler state
self.d1 = {}
self.d2 = {}
def __call__(self, features, n_samples):
a, b, c = self._sampling_loop(features, n_samples)
# process a, b, c and return them
return a, b, c
def build_lookups(self, X):
# Use X to modify state of d1 and d2
def _sampling_loop(self, features, n_samples):
# Use train_data, d_train data and other attributes to
# return some data to call`
@ray.remote
class RemoteSampler(Sampler):
def __init__(self, *args):
super(RemoteSampler, self).__init__(*args)
# TODO: don't hardcode return vals
@ray.method(num_return_vals=4)
def __call__(self, anchor, num_samples):
return self(anchor, num_samples)
@ray.method(num_return_vals=3)
def build_lookups(self, X):
a, b, c = self.build_lookups(X)
return a, b, c
def _fit_parallel(*args, **kwargs):
# method of a class where the RemoteSampler objects are initialised
train_data, d_train_data, *others = args
train_data_id = ray.put(train_data)
d_train_data_id = ray.put(d_train_data)
n_args = (train_data_id, d_train_data_id, *others)
return [RemoteSampler.remote(*n_args) for _ in range(kwargs['ncpu'])]
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (2 by maintainers)
Top GitHub Comments
@alexcoca More concisely, use
super().__init__([your_args])
. (I believe Python2 does not support this, but Ray’s python2 support reached to the end anyway)Closing because I think this is a duplicate of https://github.com/ray-project/ray/issues/449. Please reopen if that’s a mistake.