Proposal PIN: Build Serializer and Environment Separation
See original GitHub issueStatus
Proposed
Context
Currently environments contain all of their own building and execution logic. This leads to frictions in customizability of environments. For example, the current Kubernetes based environments all implement the same build
function which takes in specifications around building a Docker image that is used as a means of storing and retrieving the serialized flow. This leads to confusion and lock-in for the environments.
This new separation of moving the build step away from the execution environment itself will make it increasingly easier for users to supply their own Docker image.
Decision
Break out the build aspect of an environment into its own segment that becomes an aspect of the environment itself.
An example environment definition would now look something like:
from prefect.environments.kubernetes import DaskOnKubernetesEnvironment
from prefect.environments.build_serializers import DockerBuildSerializer
build_serializer = DockerBuildSerialized(registry_url="url", custom_dockerfile="dockerfile_info")
env = DaskOnKubernetesEnvironment(build_serializer=build_serializer)
with Flow("my flow", env=env) as f:
flow tasks here
f.deploy(project_id="id")
The build serializer would contain the information for building/serializing/storing the Flow. Once flow.serialize(build=True)
happens it will take the environment and call the build serializer’s build function which will return the relative information needed to retrieve it and then that will be serialized and populated in the environment’s metadata.
In the context of a Docker build serializer with a k8s related environment the build serializer will have values such as image_name
, image_tag
, registry_url
, etc… and once the k8s related environment ingests that metadata on setup it expects the build serializer metadata to contain the fields related to creating resources on k8s.
Consequences
The only immediate consequence is the need to change some of the setup and execute functionality of the environments.
REQUEST FOR COMMENTS HERE The name Build Serializer makes sense from the serialization standpoint but the actual build related classes need a better name.
This is ongoing, I am going to update this over the next few hours, just currently organizing thoughts.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (4 by maintainers)
I think for “context” you could also add a blurb about making the interface easier for users who want to provide their own Docker image? I think this separation of “serializing the flow into a known location” from “defining its execution environment” well make that much simpler.
Yes, but let’s link to this issue in the new PIN as a record of other considerations.