question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing example DAGs/system tests for Google services

See original GitHub issue

Description

Hello,

We have a rule that every GCP operators should have an example DAG and system test. This is true in many cases, but there are minor exceptions. https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L155-L162

  • airflow/providers/google/ads/operators/ads_to_gcs.py
  • airflow/providers/google/cloud/operators/text_to_speech.py
  • airflow/providers/google/cloud/operators/gcs_to_bigquery.py
  • airflow/providers/google/cloud/operators/adls_to_gcs.py
  • airflow/providers/google/cloud/operators/sql_to_gcs.py
  • airflow/providers/google/cloud/operators/s3_to_gcs.py
  • airflow/providers/google/cloud/operators/translate_speech.py
  • airflow/providers/google/cloud/operators/bigquery_to_mysql.py
  • airflow/providers/google/cloud/operators/speech_to_text.py
  • airflow/providers/google/cloud/operators/cassandra_to_gcs.py
  • airflow/providers/google/cloud/operators/bigquery_to_bigquery.py
  • airflow/providers/google/cloud/operators/mysql_to_gcs.py
  • airflow/providers/google/cloud/operators/mssql_to_gcs.py
  • airflow/providers/google/cloud/operators/bigquery_to_gcs.py
  • airflow/providers/google/cloud/operators/local_to_gcs.py
  • airflow/providers/google/cloud/operators/sheets_to_gcs.py
  • airflow/providers/google/suite/operators/gcs_to_sheets.py

We also lack examples for individual operators. https://github.com/apache/airflow/blob/master/tests/always/test_project_structure.py#L164-L235

  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueueDeleteOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueueResumeOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePauseOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueuePurgeOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksTaskGetOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksTasksListOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksTaskDeleteOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueueGetOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueueUpdateOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.tasks.CloudTasksQueuesListOperator (https://github.com/apache/airflow/pull/13235)
  • airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateInlineWorkflowTemplateOperator
  • airflow.providers.google.cloud.operators.dataproc.DataprocInstantiateWorkflowTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetStoredInfoTypeOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPReidentifyContentOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDeidentifyTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPCreateDLPJobOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateDeidentifyTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPDeidentifyContentOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobTriggerOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListDeidentifyTemplatesOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetDeidentifyTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListInspectTemplatesOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListStoredInfoTypesOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPUpdateInspectTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDLPJobOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListJobTriggersOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPCancelDLPJobOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetDLPJobOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPGetInspectTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListInfoTypesOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPDeleteDeidentifyTemplateOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPListDLPJobsOperator
  • airflow.providers.google.cloud.operators.dlp.CloudDLPRedactImageOperator
  • airflow.providers.google.cloud.operators.datastore.CloudDatastoreDeleteOperationOperator
  • airflow.providers.google.cloud.operators.datastore.CloudDatastoreGetOperationOperator
  • airflow.providers.google.cloud.sensors.gcs.GCSObjectExistenceSensor
  • airflow.providers.google.cloud.sensors.gcs.GCSObjectUpdateSensor
  • airflow.providers.google.cloud.sensors.gcs.GCSObjectsWtihPrefixExistenceSensor
  • airflow.providers.google.cloud.sensors.gcs.GCSUploadSessionCompleteSensor

If you decide to finish this ticket you don’t have to do all the work yourself. One PR can only deal with a single operator and it’s ok.

These example DAGs are key to ensuring high-quality integration.

  • If used in system tests, they prevent regression and facilitate testing.
  • If used in the documentation, they allow us to learn about operators in a real example. Users can easily do CTRL + C, CTRL + V, which makes it easier to write new DAGs.

If you haven’t used the GCP yet, after creating the account you will get $300, which will allow you to get to know these services better.

The implementation of this task will allow a better understanding of GCP services, as well as learn methods of testing that is required by the community. If anyone is interested in this task, I am willing to provide all the necessary tips and information.

Are you wondering how to start contributing to this project? Start by reading our contributor guide

Related Issues

N/A

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:23 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
potiukcommented, Nov 24, 2021

In Breeze you can put the files in “files” dir and it will be visible inside as “/files/*” and then in the connection you should specify path to that file 😃. I think you can specify either Json orh “Keyfile + Secret” - you do not have to specify all three. I think this page has good explanation of what is in the key. You can also - as exercise look at the unit tests of GcpBaseHook - it should have tests for all the different authentication options and should show you which combinations are valid.

1reaction
mik-lajcommented, Jul 6, 2020

@irvifa Some examples are still missing. I updated the first post. https://github.com/apache/airflow/blob/master/tests/test_project_structure.py#L125

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting DAGs | Cloud Composer
Cloud Composer 1 | Cloud Composer 2. This page provides troubleshooting steps and information for common workflow issues. Many DAG execution issues are ......
Read more >
apache-airflow-providers-google Documentation
All classes for this provider package are in airflow.providers.google python ... Add possibility to run DAGs from system tests and see DAGs logs...
Read more >
Airflow not loading dags in /usr/local/airflow/dags
If this is not the case then 2. Check the path set to the DAG folder in Airflow's config file. You can create...
Read more >
OAuth API verification FAQs - Google Cloud Platform ...
This sensitive scopes verification process typically takes 3-5 business days to complete. Apps that request restricted scopes must also verify ...
Read more >
service level indicators (SLIs), objectives (SLOs)
For example, if step 2 shows that request latency is increasing, and will miss the SLO in a few hours unless something is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found