Error in passing metadata to DataprocClusterCreateOperator
See original GitHub issueHi, I am facing some issues while installing PIP Packages in the Dataproc cluster using Initialization script, I am trying to upgrade to Airflow 2.0 from 1.10.12 (where this code works fine)
[2021-07-09 11:35:37,587] {taskinstance.py:1454} ERROR - metadata was invalid: [('PIP_PACKAGES', 'pyyaml requests pandas openpyxl'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.35.0 gax/1.26.0 gccl/airflow_v2.0.0+astro.3')
path = f"gs://goog-dataproc-initialization-actions-{self.cfg.get('region')}/python/pip-install.sh"
return DataprocClusterCreateOperator(
........
init_actions_uris=[path],
metadata=[('PIP_PACKAGES', 'pyyaml requests pandas openpyxl')],
............
)
Apache Airflow version: airflow_v2.0.0
What happened: I am trying to migrate our codebase from Airflow v1.10.12, on the deeper analysis found that as part refactoring in of below pr #6371, we can no longer pass metadata in DataprocClusterCreateOperator() as this is not being passed to ClusterGenerator() method.
What you expected to happen: Operator should work as before.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (7 by maintainers)
Top GitHub Comments
@nicolas-settembrini No Problem,
you just have to generate config from all these arguments and then pass it to the DataprocClusterCreateOperator find more details in Pull request, I have attached snapshot of documentation which will be coming in next updates. #19446
Hi, sorry to write here but i didn’t find another place talking about this.
I am using Version: 2.1.4+composer and I have a DAG where i defined the DataprocClusterCreateOperator like this:
I passed the metadata as a sequence of tuples as i read here, using the dict is not working.
Also, the metadata is not being rendered in the cluster_config.
@pateash could you please explain a more detailed way to use your workaround? In what part of the dag could i use the workaround?
Thanks in advance