[feature] Allow jobs to be scheduled on AWS Fargate
See original GitHub issueFeature Area
/area backend
What feature would you like to see?
I am trying to run a large number of kubeflow pipeline jobs on AWS Fargate.
The kubeflow pipeline components are deployed on AWS EKS. While the EKS has a Fargate profile that allows scheduling pods onto virtual nodes, Kubeflow pipeline jobs contain privileged containers that prevent them from using Fargate machine resources (https://docs.aws.amazon.com/eks/latest/userguide/fargate.html).
What is the use case or pain point?
This feature enables more cost-efficient job scheduling since many jobs (e.g., hyperparameter tuning, scenario analysis …) are ephermal, so scheduling them on a serverless machine pool such as provided by Fargate makes more sense. This avoids the need to reserve a pool of nodes upfront while supporting the burst type of workloads.
However, kubeflow pipeline jobs use privileged containers that are not supported by Fargate. For example, the wait
container
containers:
- name: wait
image: 'gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance'
command:
- argoexec
- wait
env:
- name: ARGO_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ARGO_CONTAINER_RUNTIME_EXECUTOR
value: pns
resources: {}
volumeMounts:
- name: podmetadata
mountPath: /argo/podmetadata
- name: mlpipeline-minio-artifact
readOnly: true
mountPath: /argo/secret/mlpipeline-minio-artifact
- name: input-artifacts
mountPath: /mainctrfs/tmp/inputs/config/data
subPath: config
- name: input-artifacts
mountPath: /mainctrfs/tmp/inputs/data/data
subPath: convect-prepare-data-out_path
- name: pipeline-runner-token-j2fm7
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- SYS_PTRACE
needs further configurations under securityContext
.
I am wondering if there are any workarounds or better solutions to make the jobs schedulable on serverless resource pools such as Fargate.
Is there a workaround currently?
I do not see any solutions so far.
Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:7
- Comments:8 (4 by maintainers)
Top GitHub Comments
I got a walkaround under version 1.2 to allow scheduling jobs onto Fargate nodes. Here are the things I did:
k8sapi
.and change
containerRuntimeExecutor
frompns
tok8sapi
emptyDir
as the output location. For example, I have a following helper functionThen apply the transformation to every
op
in the pipelineop
can be scheduled on Fargate (this is specific to your Fargate settings). For my case, I am using the ruleSo in the pipeline
will hint the task can be scheduled on Fargate.
@yuhuishi-convect we might want to switch to argo v3 emissary executor: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary which doesn’t require privileged permission.