Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[ray] How to write into numpy arrays in shared memory with Ray?

See original GitHub issue

I am attempting to rewrite Python multiprocessing code using Ray since it appears to be able to abstract shared memory management issues and perform parallel computation faster than straight multiprocessing (based on this article). My goal is to process all timeseries for a lat/lon grid (with both input and output arrays having shape [lat, lon, time]) in parallel without unnecessary copies of the input/output arrays. The idea is to have both input and output arrays in shared memory and multiple processes will read and write into the shared memory arrays so no copies/serialization are needed for each process to access the arrays being worked upon.

My use case is that I have a CPU-heavy function that I want to apply on all 1-D sub-arrays of a 3-D array. I have managed to make this execute much faster using a home-rolled approach for shared memory objects with multiprocessing, but the code is much more convoluted/complicated than I’m comfortable with and I’m hoping to simplify it using ray. However I’ve not yet worked out how to write into shared memory with ray, and without that, I don’t see how this can be done. Hopefully someone reading this can suggest a solution.

I have a Jupyter notebook with a simple example of what I’ve tried to get this to work using ray.

Here’s the gist:

I initialize my environment for ray and create a function that performs a simple operation on a 1-D slice of a 3-D array and writes the result into an output array, with the expectation that this function can run in parallel and read/write on shared memory representations of my input/output arrays:

import psutil
import numpy as np
import ray

num_cpus = psutil.cpu_count(logical=False)
ray.init(num_cpus=num_cpus, ignore_reinit_error=True)

@ray.remote
def add_average_ray(
        in_ary: np.ndarray,
        out_ary: np.ndarray,
        lat_index: int,
        lon_index: int,
):
    ary = in_ary[lat_index, lon_index]
    out_ary[lat_index, lon_index] = ary + np.mean(ary)

Next, I create a function that will loop over a 3-D grid of values and apply the above function to each in parallel using ray:

def compute_with_ray(
        input_array: np.ndarray,
) -> np.ndarray:
    # create an output array that computed values will be written
    output_array = np.full(shape=input_array.shape, fill_value=np.NaN)

    # put the input and output arrays into ray's object store
    in_array_id = ray.put(input_array)
    out_array_id = ray.put(output_array)

    # make a list of futures, one per lat/lon (assuming shape (lat, lon, time))
    futures = []
    for lat_index in range(input_array.shape[0]):
        for lon_index in range(input_array.shape[1]):
            futures.append(add_average_ray.remote(in_array_id, out_array_id, lat_index, lon_index))

    # launch the remote tasks in parallel
    ray.get(futures)

    return output_array

Next I make a an input array and exercise the code:

# create an array that can be used to represent a 2x2 cell lat/lon map with 3 times
tst_ary = np.array([[[1, 6, 5], [3, 2, 7]], [[8, 4., 6.], [9, 4, 2]]])

# exercise the ray remote function in parallel
average_added_ray = compute_with_ray(tst_ary)

Apparently this is not possible since the arrays that have been added into ray’s object store are read-only, and it results in an error:

RayTaskError(ValueError): ray_worker (pid=5260, host=skypilot)
  File "<ipython-input-5-0bf9c2bf3f2e>", line 9, in add_average_ray
ValueError: assignment destination is read-only

Is there a better way to approach/accomplish this parallel processing on numpy arrays using ray? Thanks in advance for any insight or suggestions.

Issue Analytics

State:
Created 4 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

3reactions

robertnishiharacommented, Dec 18, 2019

You’re right that objects placed in the object store are immutable (to prevent errors arising from multiple processes trying to mutate the same objects).

To have correct behavior, you should create a copy of the array before mutating it.

If you really want to mutate the array in place and if you are only using 1 machine (but keep in mind this can lead to incorrect/unexpected behavior depending on what you’re trying to do), you can set x.flags.writeable = True, where x is the array. This is not a supported part of the Ray API, but technically it should work.

But instead of passing in the output array, I’d suggest just returning the array and then concatenating them, since that will work with all immutable objects.

1reaction

simon-mocommented, Dec 19, 2019

Hi @monocongo

You are applying the compute functions on each 1D slices of the array. this means there are 400*400 = 160000 tasks being scheduled and run. The overhead of running and scheduling task is greater than the actual time used for compute. This is why ray is slower.

For your use case, I suggest chunking the array to num_cpu task and only issue that many tasks. or for simplicity, use apply function on each 2D slices. This should be significantly faster because only 400 tasks will be instantiated and run across your machine.