question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

convert string to GCSPath object or create one

See original GitHub issue

I want to convert my string "gs://some_bucket/some_dir" into kfp.dsl.types.GCPPath How do I do it? Or probably I need to create a GCSPath Object with the above GCS path Any ideas?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:12 (4 by maintainers)

github_iconTop GitHub Comments

16reactions
TheTravellingSalesmancommented, Sep 30, 2021

I have a similar problem with the GCPProjectID type and kfp.v2.compiler. I am trying to use this GCP Component to performe a BigQuery query. Sadly setting type_check=False in the compiler dosen’t help, and I don’t know how exactly I should apply the Ignore_type() function. The Error I get is: TypeError: Passing PipelineParam "project_id" with type "String" (as "Parameter") to component input "project_id" with type "GCPProjectID" (as "Artifact") is incompatible. Please fix the type of the component input. I understand that this is not only a type problem, more a problem of kfp.v2 expecting not primary types to be Artifacts. But how can I provide the project_id as a Artifact in my pipeline definition? This is my pipeline definition:

@kfp.dsl.pipeline(name=GC_DISPLAY_NAME)
def gc_pipeline(
    query:str=QUERY, 
    project_id:str=PROJECT_ID, 
    dataset_id:str=DATASET_ID, 
    table_id:str=GC_TRAINING_TABLE, 
    dataset_location:str=REGION, 
    job_config:str=''
):
    training_data_op = bigquery_query_op(
        query=query, 
        project_id=project_id, 
        dataset_id=dataset_id, 
        table_id=table_id,  
        dataset_location=dataset_location, 
        job_config=job_config
    )

Compiler:

from kfp.v2 import compiler

compiler.Compiler().compile(
    pipeline_func=gc_pipeline, package_path=GC_DISPLAY_NAME + ".json", type_check=False
)

I am having the same problem. It seems the v2 of the kfp compiler is having trouble these type conversions.

I am also having this same problem on kfp 1.8.3. I’m using the dataproc components from the component store.

create_dp_cluster = kfp.components.ComponentStore.default_store.load_component('gcp/dataproc/create_cluster')

def train_and_evaluate_pipeline(
    dataproc_img: str,
    service_acc: str = SERVICE_ACC,
    project_id: str = PROJECT_ID,
    region: str = REGION
):
    create_dp_cluster_task = create_dp_cluster(
            project_id=project_id.ignore_type(),
            region=region.ignore_type(),
            cluster={
                'config': {
                    'gceClusterConfig': {
                        'serviceAccount': service_acc
                    },
                    'masterConfig': {
                        'numInstances': 1,
                        'imageUri': dataproc_img,
                        'machineTypeUri': 'n1-highmem-8'
                    },
                    'workerConfig': {
                        'numInstances': 8,
                        'imageUri': dataproc_img,
                        'machineTypeUri': 'n1-highmem-8'
                    }
                }
            }
        )

Compiler:

from kfp.v2 import compiler

compiler.Compiler().compile(
    pipeline_func=train_and_evaluate_pipeline,
    package_path='train_and_evaluate_pipeline.tar.gz')

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_29850/3251206080.py in <module>
      3 compiler.Compiler().compile(
      4     pipeline_func=train_and_evaluate_pipeline,
----> 5     package_path='train_and_evaluate_pipeline.tar.gz')

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in compile(self, pipeline_func, package_path, pipeline_name, pipeline_parameters, type_check)
   1181                 pipeline_func=pipeline_func,
   1182                 pipeline_name=pipeline_name,
-> 1183                 pipeline_parameters_override=pipeline_parameters)
   1184             self._write_pipeline(pipeline_job, package_path)
   1185         finally:

/opt/conda/lib/python3.7/site-packages/kfp/v2/compiler/compiler.py in _create_pipeline_v2(self, pipeline_func, pipeline_name, pipeline_parameters_override)
   1106 
   1107         with dsl.Pipeline(pipeline_name) as dsl_pipeline:
-> 1108             pipeline_func(*args_list)
   1109 
   1110         if not dsl_pipeline.ops:

/tmp/ipykernel_29850/3958915055.py in train_and_evaluate_pipeline(features_path, training_labels_path, model_output_path, model_config_name, test_labels_path, test_predictions_out_path, coral_version, coral_build_str, dataproc_img, service_acc, job_name, project_id, region)
     24         project_id=project_id.ignore_type(),
     25         region=region.ignore_type(),
---> 26         name=job_name,
     27     )
     28 

/opt/conda/lib/python3.7/site-packages/kfp/components/_dynamic.py in dataproc_delete_cluster(project_id, region, name, wait_interval)
     51 
     52     def pass_locals():
---> 53         return dict_func(locals())  # noqa: F821 TODO
     54 
     55     code = pass_locals.__code__

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in create_task_object_from_component_and_pythonic_arguments(pythonic_arguments)
    368             component_spec=component_spec,
    369             arguments=arguments,
--> 370             component_ref=component_ref,
    371         )
    372 

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in _create_task_object_from_component_and_arguments(component_spec, arguments, component_ref, **kwargs)
    306         arguments=arguments,
    307         component_ref=component_ref,
--> 308         **kwargs,
    309     )
    310 

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _create_container_op_from_component_and_arguments(component_spec, arguments, component_ref)
    317             task.execution_options.caching_strategy.max_cache_staleness = 'P0D'
    318 
--> 319     _attach_v2_specs(task, component_spec, original_arguments)
    320 
    321     return task

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _attach_v2_specs(task, component_spec, arguments)
    634 
    635     resolved_cmd = _resolve_commands_and_args_v2(
--> 636         component_spec=component_spec, arguments=arguments)
    637 
    638     task.container_spec = (

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _resolve_commands_and_args_v2(component_spec, arguments)
    473             input_path_generator=_input_artifact_path_placeholder,
    474             output_path_generator=_resolve_output_path_placeholder,
--> 475             placeholder_resolver=_resolve_ir_placeholders_v2,
    476         )
    477         return resolved_cmd

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in _resolve_command_line_and_paths(component_spec, arguments, input_path_generator, output_path_generator, argument_serializer, placeholder_resolver)
    562 
    563     expanded_command = expand_argument_list(container_spec.command)
--> 564     expanded_args = expand_argument_list(container_spec.args)
    565 
    566     return _ResolvedCommandLineAndPaths(

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in expand_argument_list(argument_list)
    553         if argument_list is not None:
    554             for part in argument_list:
--> 555                 expanded_part = expand_command_part(part)
    556                 if expanded_part is not None:
    557                     if isinstance(expanded_part, list):

/opt/conda/lib/python3.7/site-packages/kfp/components/_components.py in expand_command_part(arg)
    470                 arg=arg,
    471                 component_spec=component_spec,
--> 472                 arguments=arguments,
    473             )
    474             if resolved_arg is not None:

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _resolve_ir_placeholders_v2(arg, component_spec, arguments)
    435                 input_value = arguments.get(input_name, None)
    436                 if input_value is not None:
--> 437                     return _input_parameter_placeholder(input_name)
    438                 else:
    439                     input_spec = inputs_dict[input_name]

/opt/conda/lib/python3.7/site-packages/kfp/dsl/_component_bridge.py in _input_parameter_placeholder(input_key)
    393                     'Input "{}" with type "{}" cannot be paired with '
    394                     'InputValuePlaceholder.'.format(
--> 395                         input_key, inputs_dict[input_key].type))
    396             else:
    397                 return "{{{{$.inputs.parameters['{}']}}}}".format(input_key)

TypeError: Input "project_id" with type "GCPProjectID" cannot be paired with InputValuePlaceholder.

Someone else mentioned that .ignore_type() works, but it appears that’s no longer the case.

This is the fundamental question that needs answering, imo, in order to get folks to pass parameters to pipelines the way that the v2 pipelines expect them to be:

I understand that this is not only a type problem, more a problem of kfp.v2 expecting not primary types to be Artifacts. But how can I provide the project_id as a Artifact in my pipeline definition?

Otherwise, it seems an inconsistency was introduced in the type conversion of strings -> dsl types with the migration to v2. This pipeline code compiles with the v1 compiler.

10reactions
SputnikTeacommented, May 27, 2021

I have a similar problem with the GCPProjectID type and kfp.v2.compiler.

I am trying to use this GCP Component to performe a BigQuery query. Sadly setting type_check=False in the compiler dosen’t help, and I don’t know how exactly I should apply the Ignore_type() function.

The Error I get is: TypeError: Passing PipelineParam "project_id" with type "String" (as "Parameter") to component input "project_id" with type "GCPProjectID" (as "Artifact") is incompatible. Please fix the type of the component input.

I understand that this is not only a type problem, more a problem of kfp.v2 expecting not primary types to be Artifacts. But how can I provide the project_id as a Artifact in my pipeline definition?

This is my pipeline definition:

@kfp.dsl.pipeline(name=GC_DISPLAY_NAME)
def gc_pipeline(
    query:str=QUERY, 
    project_id:str=PROJECT_ID, 
    dataset_id:str=DATASET_ID, 
    table_id:str=GC_TRAINING_TABLE, 
    dataset_location:str=REGION, 
    job_config:str=''
):
    training_data_op = bigquery_query_op(
        query=query, 
        project_id=project_id, 
        dataset_id=dataset_id, 
        table_id=table_id,  
        dataset_location=dataset_location, 
        job_config=job_config
    )

Compiler:

from kfp.v2 import compiler

compiler.Compiler().compile(
    pipeline_func=gc_pipeline, package_path=GC_DISPLAY_NAME + ".json", type_check=False
)
Read more comments on GitHub >

github_iconTop Results From Across the Web

String to object in JS - javascript - Stack Overflow
Actually, the best solution is using JSON: Documentation. JSON.parse(text[, reviver]);. Examples: 1) var myobj = JSON.parse('{ "hello":"world" } ...
Read more >
GcsPath (Apache Beam 2.13.0)
Creates a GcsPath from bucket and object components. ... Returns the object name associated with this GCS path, or an empty string if...
Read more >
Detect multiple objects | Cloud Vision API - Google Cloud
The Vision API can detect and extract multiple objects in an image with Object ... If you're new to Google Cloud, create an...
Read more >
How to convert string of properties into an object...
You can use the json() function to convert a string into a JSON. But you're going to need some quotes around the Properties....
Read more >
org.apache.beam.sdk.util.gcsfs.GcsPath.getFileName java ...
Returns the object name associated with this GCS path, or an empty string if no object is specified. getBucket. Returns the bucket name...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found