Upload file to new workers only
See original GitHub issueCurrently, upload_file
will only send files to workers connected to scheduler at an instant = t.
If a worker connects to the scheduler afterwards, it does not have the data send with upload_file
beforehand.
This is a problem because then we cannot assume that all the workers have the file available.
-
One simple workaround is to store the identity of the connected workers just before calling
upload_file
and only send jobs requiring this file to these workers in the future. But this prevent scalability. -
An other possibility would be to register a
SchedulerPlugin
and use itsadd_worker
event function to trigger there resending the file withupload_file
to all the workers when a new worker joins. If this solution is preferred, then it might be good thatupload_file
provides aworker
parameter, so that we can target the newly connected worker and avoid reuploading the file to other workers already having the file. -
As an extension of solution 2, add a
future_workers
bool parameter toupload_file
and extendupload_file
to automatically register theSchedulerPlugin
described above. -
Other ideas?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (3 by maintainers)
Top GitHub Comments
@H4dr1en FWIW I’ve used the following
WorkerPlugin
in the past:It may, or may not, serve as a good starting point for your use case
@mrocklin It’s a bit above my skill level as it currently stands, having just transitioned from mechanical engineering to SE. Interested yes, capable not yet. Sorry to be no help yet.