Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[bug] Set a gpu limit on a ContainerOp using a pipeline parameter input

See original GitHub issue

What steps did you take

I’m using Kfp 1.6.4 to define a pipeline. I would like to set a gpu limit on a ContainerOp using a pipeline parameter input. I have the following pipeline definition:

@dsl.pipeline(name="darknet-train-pipeline", description="Trains a darknet network")
def darknet_train_pipeline(...
                           num_gpus: int, ...) -> None:
    ...
    training_op = kfp.components.load_component_from_file(
        os.path.join(os.path.dirname(__file__), 'components/darknet_framework/training/component.yaml'))
    training_task = training_op(another_task.output)
    training_task.set_gpu_limit(num_gpus) # here a TypeError exception is raised

What happened:

When I try to call set_gpu_limit on my ContainerOp I have the following TypeError exception:

File "/Users/jean-francois.dalbos/work/git/darknet_models/venv/lib/python3.9/site-packages/kfp/dsl/_container_op.py", line 380, in set_gpu_limit
    self._container_spec.resources.accelerator.count = int(gpu)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'PipelineParam'

What did you expect to happen:

I want Kfp to create a pipeline definition where the gpu limit depends on the pipeline input parameter named num_gpus. In the argo workflow, I would expect something like (not quite sure of the syntax though):

limits: {memory: 4G, nvidia.com/gpu: {{inputs.parameters.num_gpus}} }

Environment:

How do you deploy Kubeflow Pipelines (KFP)? In Kubeflow 1.3 UI, uploading the pipeline as an argo workflow yaml file.
KFP version: 1.3
KFP SDK version: 1.6.4

Anything else you would like to add:

Labels

/area sdk

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

Issue Analytics

State:
Created 2 years ago
Reactions:4
Comments:13 (1 by maintainers)

Top GitHub Comments

6reactions

jdalbosc-ciscocommented, Jan 7, 2022

Thanks @zijianjoy, using kfp 1.8.9, add_resource_request and add_resource_limit methods instead of set_gpu_limit, I was able to specify a gpu limit using a pipeline input parameter. The updated code below that worked for me. Thanks again! 🙂

import functools
import kfp
import kfp.components


def set_resources(cpu: str = '200m', memory: str = '100M', gpu: int = None):
    """
    This is a decorator that applies CPU and memory settings to the specified component (pod).

    For security reasons, the following rules must be adhered to pass pod validations on the cluster:
      - CPU limits must not be set
      - CPU request must be set
      - Memory limits and requests must be set and equal
    """
    def decorator(component_op):
        @functools.wraps(component_op)
        def wrapper(*args, **kwargs):
            component = component_op(*args, **kwargs)
            component.set_cpu_request(cpu)
            component.set_memory_request(memory)
            component.set_memory_limit(memory)
            if gpu is not None:
                component.set_gpu_limit(gpu)
            return component
        return wrapper
    return decorator

def gpu_sum_pipeline(num_gpus: int, a: float = 1., b: float = 2.):
    @set_resources(gpu=None)
    def gpu_sum_op(a: float, b: float) -> None:
        def gpu_sum(a: float, b: float) -> None:
            print(a+b)
        gpu_sum_task_factory=kfp.components.create_component_from_func(gpu_sum,
            base_image='python:3.9')
        return gpu_sum_task_factory(a, b)

    gpu_sum_task = gpu_sum_op(a, b)
    gpu_sum_task.add_resource_request('nvidia.com/gpu', num_gpus)
    gpu_sum_task.add_resource_limit('nvidia.com/gpu', num_gpus)

kfp.compiler.Compiler().compile(
    pipeline_func=gpu_sum_pipeline,
    package_path='gpu_sum.yaml')

1reaction

ashrafgtcommented, Jan 4, 2022

@zijianjoy Thank you for checking this! I think the argo example probably didn’t work because you’re using an older version. I remember that I had tl use pod spec patches before, but not anymore. Regardless of how it’s implemented, it’ll be great to be able to dynamically control GPU resources 😃

Top Results From Across the Web

kfp.dsl._container_op — Kubeflow Pipelines documentation

PipelineParam ] = 'nvidia' ) -> 'Container': """Set gpu limit for the operator. This function add '<vendor>.com/gpu' into resource limit.

Specify the machine configuration for a pipeline step | Vertex AI

By setting the machine type parameters on the pipeline step, you can manage the ... GPU_LIMIT : The GPU limit (positive number) for...

Chapter 4. Kubeflow Pipelines - O'Reilly

If your code can be accelerated by a GPU it is easy to mark a stage as using GPU resources; simply add .set_gpu_limit(NUM_GPUS)...

Using Preemptible VMs and GPUs on Google Cloud - Kubeflow

Using preemptible GPUs with Kubeflow Pipelines · Make sure you have enough GPU quota. · Create a node pool in your GKE cluster...

Kubeflow Pipelines - Advanced Analytics Workspace

By making average.py accept an arbitrary set of numbers as inputs, ... ContainerOp in Python that defines how Kubeflow Pipelines interacts with our ......