question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] Default resource requirement/limits for the KFP UI and system services

See original GitHub issue

UPDATE: at the end, we decided to only add resource requirements, see discussion in https://github.com/kubeflow/pipelines/issues/5236#issuecomment-790301148

It’s desirable to provide a set of default resource requirement & limits for KFP UI & system services, to make sure their QoS is Guaranteed by default.

https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/ I’m not exactly sure what will be reasonable, because if they are set too low, the services may stop operating when there are workloads reaching a limit. But setting them to make QoS Guaranteed is also important, because otherwise when there are many other workloads, KFP UI & API services may be evicted because default QoS is BestEffort and BestEffort Pods are the first to be evicted by Kubernetes when it runs out of resources.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
NikeNanocommented, Feb 20, 2021

According to the argo documentation the memory and cpu usage for argo scales linearly with the nbr of workflows, see. So users will probably have to adjust this according if they are running heavier workloads or like to reduce costs.

I would be happy to update this!

/assign

1reaction
Bobgycommented, Feb 19, 2021

Got some help from Sid Palas:

A couple of example request settings:
ml-pipeline (api server)
        requests:
          cpu: '2'
          memory: 4Gi
ml-pipeline-ui
        requests:
          cpu: 10m
          memory: 70Mi
workflow-controller (argo)
        requests:
          cpu: 200m
          memory: 3Gi
minio
          requests:
            cpu: 20m
            memory: 25Mi
persistent-agent
          requests:
            cpu: 120m
            memory: 2Gi

see thread https://kubeflow.slack.com/archives/CE10KS9M4/p1613655024114300

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found