How can I create and set a custom result handler?
See original GitHub issueArchived from the Prefect Public Slack Community
ryan.abernathey: Thanks <@UKNSNMUE6> for stopping by the Pangeo ML Working Group meeting today.
Iāve got a couple of follow-up questions. Let me know if any of these should be escalated to GitHub issues.
The main question is about ResultHandler
objects. In Pangeo, our I/O stack is something like Google Cloud Storage <- GCSFS <- Zarr -< Xarray.
I would like a Prefect task to write data to GCS. The normal way I would do this (without Prefect) is:
ds = # ... create xarray Dataset
gcfs_w_token = gcsfs.GCSFileSystem(project='pangeo-181919', token=token)
gcsmap = gcsfs.GCSMap(path, gcs=gcfs_w_token)
ds.to_zarr(gcsmap)
Obviously I can do that from within a Prefect task, but it kind of seems like I should be using a ResultHandler
. Can you point me to any examples of custom handlers? (Bonus points if they show how to use secure credentials.)
Thanks again for an awesome tool.
chris: Hey <@UN5UWFR9T>! Good question; at the end of the day, a result handler is simply an object with read
/ write
methods that are inverses of each other (and it needs to be cloudpickle-able for running on dask). For example, here is our internal implementation of a GCS result handler: https://github.com/PrefectHQ/prefect/blob/master/src/prefect/engine/result_handlers/gcs_result_handler.py
This implementation wonāt be nearly as performant as using gcfs
, but should convey the idea. This handler also uses āPrefect Secretsā ā> when running locally, secrets are pulled from prefect.context
, and can be set via environment variable (e.g., export PREFECT__CONTEXT__SECRETS="my-secret"
). If you need added security, you could use an encryption package for parsing the secret.
ryan.abernathey: This seems very useful. Thanks!
How do I associate a result with a specific handler?
chris: To actually trigger this result handler call, you need to ācheckpointā your Task (Prefect has a bias against storing data unnecessarily, unless users opt-in). Two things are necessary to make checkpointing work:
- tasks need to request checkpointing and set their result handler:
@task(checkpoint=True, result_handler=my_handler())
- the appropriate setting needs to be turned on via env var / config:
export PREFECT__FLOWS__CHECKPOINTING=true
during execution
chris: Task result handlers can be specified via the result_handler
keyword as above
ryan.abernathey: Iāll give this a try and report back. Thanks
chris: yea anytime! Iām super excited to hear the Pangeo groupās feedback and possible work with you all to improve Prefect!
chris: <@ULVA73B9P> archive āHow can I create and set a custom result handler?ā
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (4 by maintainers)
And this is necessary because
google.cloud
objects likeclient.bucket
are not cloudpickleable?In my class, I will be using gcsfs to interact with GCS. These objects are designed from the beginning to be serializable. So that might not be necessary in my case.
btw, you might consider using gcsfs and other filesystem-spec objects within Prefect. They are part of the dask ecosystem so are all designed with serializability in mind. Might allow you to delete some code.
And
ffspec
as a larger integration is a very interesting idea; this could turn into a hook for persisting data and easily swapping out between local / external filesystems š¤