question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[bug] Set a gpu limit on a ContainerOp using a pipeline parameter input

See original GitHub issue

What steps did you take

I’m using Kfp 1.6.4 to define a pipeline. I would like to set a gpu limit on a ContainerOp using a pipeline parameter input. I have the following pipeline definition:

@dsl.pipeline(name="darknet-train-pipeline", description="Trains a darknet network")
def darknet_train_pipeline(...
                           num_gpus: int, ...) -> None:
    ...
    training_op = kfp.components.load_component_from_file(
        os.path.join(os.path.dirname(__file__), 'components/darknet_framework/training/component.yaml'))
    training_task = training_op(another_task.output)
    training_task.set_gpu_limit(num_gpus) # here a TypeError exception is raised

What happened:

When I try to call set_gpu_limit on my ContainerOp I have the following TypeError exception:

File "/Users/jean-francois.dalbos/work/git/darknet_models/venv/lib/python3.9/site-packages/kfp/dsl/_container_op.py", line 380, in set_gpu_limit
    self._container_spec.resources.accelerator.count = int(gpu)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'PipelineParam'

What did you expect to happen:

I want Kfp to create a pipeline definition where the gpu limit depends on the pipeline input parameter named num_gpus. In the argo workflow, I would expect something like (not quite sure of the syntax though):

limits: {memory: 4G, nvidia.com/gpu: {{inputs.parameters.num_gpus}} }

Environment:

  • How do you deploy Kubeflow Pipelines (KFP)? In Kubeflow 1.3 UI, uploading the pipeline as an argo workflow yaml file.

  • KFP version: 1.3

  • KFP SDK version: 1.6.4

Anything else you would like to add:

Labels

/area sdk


Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:4
  • Comments:13 (1 by maintainers)

github_iconTop GitHub Comments

6reactions
jdalbosc-ciscocommented, Jan 7, 2022

Thanks @zijianjoy, using kfp 1.8.9, add_resource_request and add_resource_limit methods instead of set_gpu_limit, I was able to specify a gpu limit using a pipeline input parameter. The updated code below that worked for me. Thanks again! 🙂

import functools
import kfp
import kfp.components


def set_resources(cpu: str = '200m', memory: str = '100M', gpu: int = None):
    """
    This is a decorator that applies CPU and memory settings to the specified component (pod).

    For security reasons, the following rules must be adhered to pass pod validations on the cluster:
      - CPU limits must not be set
      - CPU request must be set
      - Memory limits and requests must be set and equal
    """
    def decorator(component_op):
        @functools.wraps(component_op)
        def wrapper(*args, **kwargs):
            component = component_op(*args, **kwargs)
            component.set_cpu_request(cpu)
            component.set_memory_request(memory)
            component.set_memory_limit(memory)
            if gpu is not None:
                component.set_gpu_limit(gpu)
            return component
        return wrapper
    return decorator

def gpu_sum_pipeline(num_gpus: int, a: float = 1., b: float = 2.):
    @set_resources(gpu=None)
    def gpu_sum_op(a: float, b: float) -> None:
        def gpu_sum(a: float, b: float) -> None:
            print(a+b)
        gpu_sum_task_factory=kfp.components.create_component_from_func(gpu_sum,
            base_image='python:3.9')
        return gpu_sum_task_factory(a, b)

    gpu_sum_task = gpu_sum_op(a, b)
    gpu_sum_task.add_resource_request('nvidia.com/gpu', num_gpus)
    gpu_sum_task.add_resource_limit('nvidia.com/gpu', num_gpus)

kfp.compiler.Compiler().compile(
    pipeline_func=gpu_sum_pipeline,
    package_path='gpu_sum.yaml')
1reaction
ashrafgtcommented, Jan 4, 2022

@zijianjoy Thank you for checking this! I think the argo example probably didn’t work because you’re using an older version. I remember that I had tl use pod spec patches before, but not anymore. Regardless of how it’s implemented, it’ll be great to be able to dynamically control GPU resources 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

kfp.dsl._container_op — Kubeflow Pipelines documentation
PipelineParam ] = 'nvidia' ) -> 'Container': """Set gpu limit for the operator. This function add '<vendor>.com/gpu' into resource limit.
Read more >
Specify the machine configuration for a pipeline step | Vertex AI
By setting the machine type parameters on the pipeline step, you can manage the ... GPU_LIMIT : The GPU limit (positive number) for...
Read more >
Chapter 4. Kubeflow Pipelines - O'Reilly
If your code can be accelerated by a GPU it is easy to mark a stage as using GPU resources; simply add .set_gpu_limit(NUM_GPUS)...
Read more >
Using Preemptible VMs and GPUs on Google Cloud - Kubeflow
Using preemptible GPUs with Kubeflow Pipelines · Make sure you have enough GPU quota. · Create a node pool in your GKE cluster...
Read more >
Kubeflow Pipelines - Advanced Analytics Workspace
By making average.py accept an arbitrary set of numbers as inputs, ... ContainerOp in Python that defines how Kubeflow Pipelines interacts with our ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found