Ability to preload modules when workers start
See original GitHub issueDescribe your feature request
In any reasonably large project you would face an issue of circular imports. This is often handled by having a pre-defined entry point - a module that always loads first, bootstrapping the rest of the application. For instance, in web applications that would be the module holding WSGI / ASGI app. You can easily get a circular import error if the first module you attempt to import is some other random module, and not the intended one.
This is exactly what happens within Ray workers. E.g. when I start an Actor worker and it tries to unpickle the actor class, the first module it attempts to load is the module where actor is defined - leading to circular import errors as described above.
Just in case, I’m aware of _code_search_path
parameter of ray.init
, but that doesn’t help. All my modules are already on PATH, the only thing that’s wrong is the order in which they get imported in a Ray worker.
There should be an option to provide a list of modules, that would be imported in a given order in each worker before any other user code is run. That would allow users to ensure correct module loading order - e.g. in my case I would specify the WSGI entry point as the module to preload.
For reference, exact same functionality exists (and is widely used) in all other distributed computing libraries, e.g.:
- Dask has preload option:
A module or Python file passed as a --preload value is guaranteed to be imported before establishing any connection.
- Celery has imports option:
A sequence of modules to import when the worker starts. The modules will be imported in the original order.
I’m a bit surprised I can’t seem to find any reference to this functionality in Ray - not even within issues/discussions. It would expect the absence of that option to cause a fair amount of headache to many reasonably large projects.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:12 (6 by maintainers)
Any updates here?
Fair point. Parameters are involved, though realistically that might just be bad code hygiene.