ValueError: buffer source array is read-only in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__ with ray
See original GitHub issueDescribe the bug
When we tried upgrading from scikit-learn 0.24.2 to 1.0.1, we got this error when using scikit-learn with with ray:
(pid=18644) ValueError: buffer source array is read-only
Traceback (most recent call last):
File "/Users/fred/raven/opensource/raven/framework/Driver.py", line 313, in <module>
raven()
File "/Users/fred/raven/opensource/raven/framework/Driver.py", line 266, in raven
simulation.run()
File "/Users/fred/raven/opensource/raven/framework/Simulation.py", line 764, in run
stepInstance.takeAstep(stepInputDict)
File "/Users/fred/raven/opensource/raven/framework/Steps/Step.py", line 326, in takeAstep
self._localTakeAstepRun(inDictionary)
File "/Users/fred/raven/opensource/raven/framework/Steps/MultiRun.py", line 179, in _localTakeAstepRun
myLambda([finishedJob,outputs[outIndex]])
File "/Users/fred/raven/opensource/raven/framework/Steps/MultiRun.py", line 109, in <lambda>
self._outputCollectionLambda.append( (lambda x: inDictionary['Model'].collectOutput(x[0],x[1]), outIndex) )
File "/Users/fred/raven/opensource/raven/framework/Models/Dummy.py", line 219, in collectOutput
result = finishedJob.getEvaluation()
File "/Users/fred/raven/opensource/raven/framework/Runners/InternalRunner.py", line 97, in getEvaluation
self._collectRunnerResponse()
File "/Users/fred/raven/opensource/raven/framework/Runners/DistributedMemoryRunner.py", line 83, in _collectRunnerResponse
self.runReturn = ray.get(self.thread) if im.isLibAvail("ray") else self.thread()
File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
return func(*args, **kwargs)
File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/worker.py", line 1564, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError: ray::evaluateSample() (pid=18644, ip=192.168.0.102)
File "python/ray/_raylet.pyx", line 493, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 514, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 384, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: buffer source array is read-only
traceback: Traceback (most recent call last):
File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 251, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 189, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 167, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 155, in _deserialize_pickle5_data
obj = pickle.loads(in_band, buffers=buffers)
File "sklearn/neighbors/_dist_metrics.pyx", line 223, in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__
File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
The relevant code in scikit-learn is:
def __setstate__(self, state):
"""
set state for pickling
"""
self.p = state[0]
self.vec = state[1] #line 223
self.mat = state[2]
if self.__class__.__name__ == "PyFuncDistance":
self.func = state[3]
self.kwargs = state[4]
self.size = self.vec.shape[0]
Steps/Code to Reproduce
I do not have a reduced case. If I have time, I will try and create one. This fails: https://github.com/joshua-cogliati-inl/raven/tree/cogljj/update_libraries with commit https://github.com/joshua-cogliati-inl/raven/commit/8de1e24aaea6b3cc1fb7367e3ce342170f858216 When running tests/framework/InternalParallelTests/ROMscikit which is code designed to test the class: https://github.com/joshua-cogliati-inl/raven/blob/cogljj/update_libraries/framework/SupervisedLearning/ScikitLearn/Neighbors/KNeighborsRegressor.py
Basically, we are using ray to distribute a sklearn.neighbors.KNeighborsRegressor and we get the above error.
Expected Results
Scikit learn can distribute DistanceMetric remotely with ray.
Actual Results
File "sklearn/neighbors/_dist_metrics.pyx", line 223, in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__
File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
Versions
>>> import sklearn; sklearn.show_versions()
System:
python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 20:33:18) [Clang 11.1.0 ]
executable: /Users/fred/miniconda3/envs/raven_libraries_tf26set/bin/python
machine: macOS-10.14.6-x86_64-i386-64bit
Python dependencies:
pip: 21.3.1
setuptools: 58.5.3
sklearn: 1.0.1
numpy: 1.19.5
scipy: 1.7.1
Cython: None
pandas: 1.3.4
matplotlib: 3.4.3
joblib: 1.1.0
threadpoolctl: 3.0.0
Built with OpenMP: True
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (8 by maintainers)
Top GitHub Comments
@joshua-cogliati-inl Thanks for the link to the issue! I was able to reproduce and opened a #21845 to fix the bug.
Possibly dd7b7e5ef950ac026ac44d758af9167eafcc9ee2 might have caused changes with serialization with binary_tree?