Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

convert string to GCSPath object or create one

See original GitHub issue

I want to convert my string "gs://some_bucket/some_dir" into kfp.dsl.types.GCPPath How do I do it? Or probably I need to create a GCSPath Object with the above GCS path Any ideas?

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:12 (4 by maintainers)

Top GitHub Comments

16reactions

TheTravellingSalesmancommented, Sep 30, 2021

I have a similar problem with the GCPProjectID type and kfp.v2.compiler. I am trying to use this GCP Component to performe a BigQuery query. Sadly setting type_check=False in the compiler dosen’t help, and I don’t know how exactly I should apply the Ignore_type() function. The Error I get is: TypeError: Passing PipelineParam "project_id" with type "String" (as "Parameter") to component input "project_id" with type "GCPProjectID" (as "Artifact") is incompatible. Please fix the type of the component input. I understand that this is not only a type problem, more a problem of kfp.v2 expecting not primary types to be Artifacts. But how can I provide the project_id as a Artifact in my pipeline definition? This is my pipeline definition:
@kfp.dsl.pipeline(name=GC_DISPLAY_NAME)
def gc_pipeline(
    query:str=QUERY, 
    project_id:str=PROJECT_ID, 
    dataset_id:str=DATASET_ID, 
    table_id:str=GC_TRAINING_TABLE, 
    dataset_location:str=REGION, 
    job_config:str=''
):
    training_data_op = bigquery_query_op(
        query=query, 
        project_id=project_id, 
        dataset_id=dataset_id, 
        table_id=table_id,  
        dataset_location=dataset_location, 
        job_config=job_config
    )
Compiler:
from kfp.v2 import compiler

compiler.Compiler().compile(
    pipeline_func=gc_pipeline, package_path=GC_DISPLAY_NAME + ".json", type_check=False
)
I am having the same problem. It seems the v2 of the kfp compiler is having trouble these type conversions.

I am also having this same problem on kfp 1.8.3. I’m using the dataproc components from the component store.

create_dp_cluster = kfp.components.ComponentStore.default_store.load_component('gcp/dataproc/create_cluster')

def train_and_evaluate_pipeline(
    dataproc_img: str,
    service_acc: str = SERVICE_ACC,
    project_id: str = PROJECT_ID,
    region: str = REGION
):
    create_dp_cluster_task = create_dp_cluster(
            project_id=project_id.ignore_type(),
            region=region.ignore_type(),
            cluster={
                'config': {
                    'gceClusterConfig': {
                        'serviceAccount': service_acc
                    },
                    'masterConfig': {
                        'numInstances': 1,
                        'imageUri': dataproc_img,
                        'machineTypeUri': 'n1-highmem-8'
                    },
                    'workerConfig': {
                        'numInstances': 8,
                        'imageUri': dataproc_img,
                        'machineTypeUri': 'n1-highmem-8'
                    }
                }
            }
        )

Compiler:

from kfp.v2 import compiler

compiler.Compiler().compile(
    pipeline_func=train_and_evaluate_pipeline,
    package_path='train_and_evaluate_pipeline.tar.gz')

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_29850/3251206080.py in <module>
      3 compiler.Compiler().compile(
      4     pipeline_func=train_and_evaluate_pipeline,
----> 5     package_path='train_and_evaluate_pipeline.tar.gz')

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, pipeline_name, pipeline_parameters, type_check)
   1181                 pipeline_func=pipeline_func,
   1182                 pipeline_name=pipeline_name,
-> 1183                 pipeline_parameters_override=pipeline_parameters)
   1184             self._write_pipeline(pipeline_job, package_path)
   1185         finally:

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, pipeline_name, pipeline_parameters_override)
   1106 
   1107         with dsl.Pipeline(pipeline_name) as dsl_pipeline:
-> 1108             pipeline_func(*args_list)
   1109 
   1110         if not dsl_pipeline.ops:

/tmp/ipykernel_29850/3958915055.py in train_and_evaluate_pipeline(features_path, training_labels_path, model_output_path, model_config_name, test_labels_path, test_predictions_out_path, coral_version, coral_build_str, dataproc_img, service_acc, job_name, project_id, region)
     24         project_id=project_id.ignore_type(),
     25         region=region.ignore_type(),
---> 26         name=job_name,
     27     )
     28 

/opt/conda/lib/python3.7/site-packages/kfp/components/_dynamic.py in dataproc_delete_cluster(project_id, region, name, wait_interval)
     51 
     52     def pass_locals():
---> 53         return dict_func(locals())  # noqa: F821 TODO
     54 
     55     code = pass_locals.__code__

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in create_task_object_from_component_and_pythonic_arguments(pythonic_arguments)
    368             component_spec=component_spec,
    369             arguments=arguments,
--> 370             component_ref=component_ref,
    371         )
    372 

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in _create_task_object_from_component_and_arguments(component_spec, arguments, component_ref, **kwargs)
    306         arguments=arguments,
    307         component_ref=component_ref,
--> 308         **kwargs,
    309     )
    310 

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _create_container_op_from_component_and_arguments(component_spec, arguments, component_ref)
    317             task.execution_options.caching_strategy.max_cache_staleness = 'P0D'
    318 
--> 319     _attach_v2_specs(task, component_spec, original_arguments)
    320 
    321     return task

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _attach_v2_specs(task, component_spec, arguments)
    634 
    635     resolved_cmd = _resolve_commands_and_args_v2(
--> 636         component_spec=component_spec, arguments=arguments)
    637 
    638     task.container_spec = (

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _resolve_commands_and_args_v2(component_spec, arguments)
    473             input_path_generator=_input_artifact_path_placeholder,
    474             output_path_generator=_resolve_output_path_placeholder,
--> 475             placeholder_resolver=_resolve_ir_placeholders_v2,
    476         )
    477         return resolved_cmd

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in _resolve_command_line_and_paths(component_spec, arguments, input_path_generator, output_path_generator, argument_serializer, placeholder_resolver)
    562 
    563     expanded_command = expand_argument_list(container_spec.command)
--> 564     expanded_args = expand_argument_list(container_spec.args)
    565 
    566     return _ResolvedCommandLineAndPaths(

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in expand_argument_list(argument_list)
    553         if argument_list is not None:
    554             for part in argument_list:
--> 555                 expanded_part = expand_command_part(part)
    556                 if expanded_part is not None:
    557                     if isinstance(expanded_part, list):

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in expand_command_part(arg)
    470                 arg=arg,
    471                 component_spec=component_spec,
--> 472                 arguments=arguments,
    473             )
    474             if resolved_arg is not None:

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _resolve_ir_placeholders_v2(arg, component_spec, arguments)
    435                 input_value = arguments.get(input_name, None)
    436                 if input_value is not None:
--> 437                     return _input_parameter_placeholder(input_name)
    438                 else:
    439                     input_spec = inputs_dict[input_name]

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _input_parameter_placeholder(input_key)
    393                     'Input "{}" with type "{}" cannot be paired with '
    394                     'InputValuePlaceholder.'.format(
--> 395                         input_key, inputs_dict[input_key].type))
    396             else:
    397                 return "{{{{$.inputs.parameters['{}']}}}}".format(input_key)

TypeError: Input "project_id" with type "GCPProjectID" cannot be paired with InputValuePlaceholder.

Someone else mentioned that .ignore_type() works, but it appears that’s no longer the case.

This is the fundamental question that needs answering, imo, in order to get folks to pass parameters to pipelines the way that the v2 pipelines expect them to be:

I understand that this is not only a type problem, more a problem of kfp.v2 expecting not primary types to be Artifacts. But how can I provide the project_id as a Artifact in my pipeline definition?

Otherwise, it seems an inconsistency was introduced in the type conversion of strings -> dsl types with the migration to v2. This pipeline code compiles with the v1 compiler.

10reactions

SputnikTeacommented, May 27, 2021

I have a similar problem with the GCPProjectID type and kfp.v2.compiler.

I am trying to use this GCP Component to performe a BigQuery query. Sadly setting type_check=False in the compiler dosen’t help, and I don’t know how exactly I should apply the Ignore_type() function.

The Error I get is: TypeError: Passing PipelineParam "project_id" with type "String" (as "Parameter") to component input "project_id" with type "GCPProjectID" (as "Artifact") is incompatible. Please fix the type of the component input.

I understand that this is not only a type problem, more a problem of kfp.v2 expecting not primary types to be Artifacts. But how can I provide the project_id as a Artifact in my pipeline definition?

This is my pipeline definition:

@kfp.dsl.pipeline(name=GC_DISPLAY_NAME)
def gc_pipeline(
    query:str=QUERY, 
    project_id:str=PROJECT_ID, 
    dataset_id:str=DATASET_ID, 
    table_id:str=GC_TRAINING_TABLE, 
    dataset_location:str=REGION, 
    job_config:str=''
):
    training_data_op = bigquery_query_op(
        query=query, 
        project_id=project_id, 
        dataset_id=dataset_id, 
        table_id=table_id,  
        dataset_location=dataset_location, 
        job_config=job_config
    )

Compiler:

from kfp.v2 import compiler

compiler.Compiler().compile(
    pipeline_func=gc_pipeline, package_path=GC_DISPLAY_NAME + ".json", type_check=False
)

Top Results From Across the Web

String to object in JS - javascript - Stack Overflow

Actually, the best solution is using JSON: Documentation. JSON.parse(text[, reviver]);. Examples: 1) var myobj = JSON.parse('{ "hello":"world" } ...

GcsPath (Apache Beam 2.13.0)

Creates a GcsPath from bucket and object components. ... Returns the object name associated with this GCS path, or an empty string if...

Detect multiple objects | Cloud Vision API - Google Cloud

The Vision API can detect and extract multiple objects in an image with Object ... If you're new to Google Cloud, create an...

How to convert string of properties into an object...

You can use the json() function to convert a string into a JSON. But you're going to need some quotes around the Properties....

org.apache.beam.sdk.util.gcsfs.GcsPath.getFileName java ...

Returns the object name associated with this GCS path, or an empty string if no object is specified. getBucket. Returns the bucket name...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

convert string to GCSPath object or create one

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

ContainerOp alternative

ComponentStore fails to include filename when pulling by version or tag