question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError: buffer source array is read-only in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__ with ray

See original GitHub issue

Describe the bug

When we tried upgrading from scikit-learn 0.24.2 to 1.0.1, we got this error when using scikit-learn with with ray:

(pid=18644) ValueError: buffer source array is read-only
Traceback (most recent call last):
  File "/Users/fred/raven/opensource/raven/framework/Driver.py", line 313, in <module>
    raven()
  File "/Users/fred/raven/opensource/raven/framework/Driver.py", line 266, in raven
    simulation.run()
  File "/Users/fred/raven/opensource/raven/framework/Simulation.py", line 764, in run
    stepInstance.takeAstep(stepInputDict)
  File "/Users/fred/raven/opensource/raven/framework/Steps/Step.py", line 326, in takeAstep
    self._localTakeAstepRun(inDictionary)
  File "/Users/fred/raven/opensource/raven/framework/Steps/MultiRun.py", line 179, in _localTakeAstepRun
    myLambda([finishedJob,outputs[outIndex]])
  File "/Users/fred/raven/opensource/raven/framework/Steps/MultiRun.py", line 109, in <lambda>
    self._outputCollectionLambda.append( (lambda x: inDictionary['Model'].collectOutput(x[0],x[1]), outIndex) )
  File "/Users/fred/raven/opensource/raven/framework/Models/Dummy.py", line 219, in collectOutput
    result = finishedJob.getEvaluation()
  File "/Users/fred/raven/opensource/raven/framework/Runners/InternalRunner.py", line 97, in getEvaluation
    self._collectRunnerResponse()
  File "/Users/fred/raven/opensource/raven/framework/Runners/DistributedMemoryRunner.py", line 83, in _collectRunnerResponse
    self.runReturn = ray.get(self.thread) if im.isLibAvail("ray") else self.thread()
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
    return func(*args, **kwargs)
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/worker.py", line 1564, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError: ray::evaluateSample() (pid=18644, ip=192.168.0.102)
  File "python/ray/_raylet.pyx", line 493, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 514, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 384, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: buffer source array is read-only
traceback: Traceback (most recent call last):
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 251, in deserialize_objects
    obj = self._deserialize_object(data, metadata, object_ref)
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 189, in _deserialize_object
    return self._deserialize_msgpack_data(data, metadata_fields)
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 167, in _deserialize_msgpack_data
    python_objects = self._deserialize_pickle5_data(pickle5_data)
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 155, in _deserialize_pickle5_data
    obj = pickle.loads(in_band, buffers=buffers)
  File "sklearn/neighbors/_dist_metrics.pyx", line 223, in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only

The relevant code in scikit-learn is:

    def __setstate__(self, state):
        """
        set state for pickling
        """
        self.p = state[0]
        self.vec = state[1] #line 223
        self.mat = state[2]
        if self.__class__.__name__ == "PyFuncDistance":
            self.func = state[3]
            self.kwargs = state[4]
        self.size = self.vec.shape[0]

Steps/Code to Reproduce

I do not have a reduced case. If I have time, I will try and create one. This fails: https://github.com/joshua-cogliati-inl/raven/tree/cogljj/update_libraries with commit https://github.com/joshua-cogliati-inl/raven/commit/8de1e24aaea6b3cc1fb7367e3ce342170f858216 When running tests/framework/InternalParallelTests/ROMscikit which is code designed to test the class: https://github.com/joshua-cogliati-inl/raven/blob/cogljj/update_libraries/framework/SupervisedLearning/ScikitLearn/Neighbors/KNeighborsRegressor.py

Basically, we are using ray to distribute a sklearn.neighbors.KNeighborsRegressor and we get the above error.

Expected Results

Scikit learn can distribute DistanceMetric remotely with ray.

Actual Results

  File "sklearn/neighbors/_dist_metrics.pyx", line 223, in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only

Versions

>>> import sklearn; sklearn.show_versions()

System:
    python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 20:33:18)  [Clang 11.1.0 ]
executable: /Users/fred/miniconda3/envs/raven_libraries_tf26set/bin/python
   machine: macOS-10.14.6-x86_64-i386-64bit

Python dependencies:
          pip: 21.3.1
   setuptools: 58.5.3
      sklearn: 1.0.1
        numpy: 1.19.5
        scipy: 1.7.1
       Cython: None
       pandas: 1.3.4
   matplotlib: 3.4.3
       joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
thomasjpfancommented, Dec 1, 2021

@joshua-cogliati-inl Thanks for the link to the issue! I was able to reproduce and opened a #21845 to fix the bug.

1reaction
joshua-cogliati-inlcommented, Nov 29, 2021

Possibly dd7b7e5ef950ac026ac44d758af9167eafcc9ee2 might have caused changes with serialization with binary_tree?

Read more comments on GitHub >

github_iconTop Results From Across the Web

ValueError buffer source array is read-only - Stack Overflow
I'm using this code to extract GLCM features for a certain image but it gives me a value error.
Read more >
buffer source array is read-only with ds.map_batches and ...
Hi I am facing problems processing the text data using ds.map_batches with pandas as the batch format. Getting ValueError: buffer source ...
Read more >
scikit-learn: sklearn/neighbors/tests/test_kd_tree.py - Fossies
As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) Python source code syntax highlighting...
Read more >
Mercari Price Suggestion Challenge | Kaggle
Hi Anttip, Is wordbach FTRL compatable with sklearn gridsearchcv? I am getting below error when i use with gridsearchcv: "AttributeError: 'wordbatch.models.
Read more >
What's new in 1.1.2 (September 8, 2020) - Pandas
Fixed regression in DataFrameGroupBy.agg() where a ValueError: buffer source array is read-only would be raised when the underlying array is ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found