[bug] Set a gpu limit on a ContainerOp using a pipeline parameter input
See original GitHub issueWhat steps did you take
I’m using Kfp 1.6.4 to define a pipeline. I would like to set a gpu limit on a ContainerOp using a pipeline parameter input. I have the following pipeline definition:
@dsl.pipeline(name="darknet-train-pipeline", description="Trains a darknet network")
def darknet_train_pipeline(...
num_gpus: int, ...) -> None:
...
training_op = kfp.components.load_component_from_file(
os.path.join(os.path.dirname(__file__), 'components/darknet_framework/training/component.yaml'))
training_task = training_op(another_task.output)
training_task.set_gpu_limit(num_gpus) # here a TypeError exception is raised
What happened:
When I try to call set_gpu_limit on my ContainerOp I have the following TypeError exception:
File "/Users/jean-francois.dalbos/work/git/darknet_models/venv/lib/python3.9/site-packages/kfp/dsl/_container_op.py", line 380, in set_gpu_limit
self._container_spec.resources.accelerator.count = int(gpu)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'PipelineParam'
What did you expect to happen:
I want Kfp to create a pipeline definition where the gpu limit depends on the pipeline input parameter named num_gpus. In the argo workflow, I would expect something like (not quite sure of the syntax though):
limits: {memory: 4G, nvidia.com/gpu: {{inputs.parameters.num_gpus}} }
Environment:
-
How do you deploy Kubeflow Pipelines (KFP)? In Kubeflow 1.3 UI, uploading the pipeline as an argo workflow yaml file.
-
KFP version: 1.3
-
KFP SDK version: 1.6.4
Anything else you would like to add:
Labels
/area sdk
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:13 (1 by maintainers)
Top GitHub Comments
Thanks @zijianjoy, using kfp 1.8.9, add_resource_request and add_resource_limit methods instead of set_gpu_limit, I was able to specify a gpu limit using a pipeline input parameter. The updated code below that worked for me. Thanks again! 🙂
@zijianjoy Thank you for checking this! I think the argo example probably didn’t work because you’re using an older version. I remember that I had tl use pod spec patches before, but not anymore. Regardless of how it’s implemented, it’ll be great to be able to dynamically control GPU resources 😃