question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected

See original GitHub issue

Requesting a databuilder example dag for Redshift. The RedshiftMetadataExtractor is present in the list of extracters. An example dag for sample metadata for redshift would be helpful. The postgres example databuilder dag does not seem to bring in views. Using RedshiftMetadataExtractor in place of PostgresMetadataExtractor in the sample postgres dag does not solve this problem. They still do not load the views although running fine.

The following is the trace of a successful dag running in airflow which still does not load views in amundsen somehow.

        SELECT
            *
        FROM (
            SELECT
              CURRENT_DATABASE() as cluster,
              c.table_schema as schema,
              c.table_name as name,
              pgtd.description as description,
              c.column_name as col_name,
              c.data_type as col_type,
              pgcd.description as col_description,
              ordinal_position as col_sort_order
            FROM INFORMATION_SCHEMA.COLUMNS c
            INNER JOIN
              pg_catalog.pg_statio_all_tables as st on c.table_schema=st.schemaname and c.table_name=st.relname
            LEFT JOIN
              pg_catalog.pg_description pgcd on pgcd.objoid=st.relid and pgcd.objsubid=c.ordinal_position
            LEFT JOIN
              pg_catalog.pg_description pgtd on pgtd.objoid=st.relid and pgtd.objsubid=0

            UNION

            SELECT
              CURRENT_DATABASE() as cluster,
              view_schema as schema,
              view_name as name,
              NULL as description,
              column_name as col_name,
              data_type as col_type,
              NULL as col_description,
              ordinal_position as col_sort_order
            FROM
                PG_GET_LATE_BINDING_VIEW_COLS()
                    COLS(view_schema NAME, view_name NAME, column_name NAME, data_type VARCHAR, ordinal_position INT)

            UNION

            SELECT
              CURRENT_DATABASE() AS cluster,
              schemaname AS schema,
              tablename AS name,
              NULL AS description,
              columnname AS col_name,
              external_type AS col_type,
              NULL AS col_description,
              columnnum AS col_sort_order
            FROM svv_external_columns
        )

        
        ORDER by cluster, schema, name, col_sort_order ;
        
[2021-08-27 13:53:50,582] {task.py:53} INFO - Running a task
[2021-08-27 13:53:50,584] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 0)
[2021-08-27 13:53:50,587] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Column', 1)
[2021-08-27 13:53:50,587] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Database', 2)
[2021-08-27 13:53:50,588] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Cluster', 2)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Schema', 2)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Schema', 'Table', 'TABLE', 3)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 'Column', 'COLUMN', 3)
[2021-08-27 13:53:50,590] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Database', 'Cluster', 'CLUSTER', 3)
[2021-08-27 13:53:50,590] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Cluster', 'Schema', 'SCHEMA', 3)
[2021-08-27 13:53:51,338] {task.py:72} INFO - Extracted 500 records so far
[2021-08-27 13:53:51,707] {task.py:72} INFO - Extracted 1000 records so far
[2021-08-27 13:53:51,971] {task.py:72} INFO - Extracted 1500 records so far
[2021-08-27 13:53:52,456] {task.py:72} INFO - Extracted 2000 records so far
[2021-08-27 13:53:52,689] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Description', 4)
[2021-08-27 13:53:52,689] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 'Description', 'DESCRIPTION', 3)
[2021-08-27 13:53:52,745] {task.py:72} INFO - Extracted 2500 records so far
[2021-08-27 13:53:52,917] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Column', 'Description', 'DESCRIPTION', 3)
[2021-08-27 13:53:52,981] {task.py:72} INFO - Extracted 3000 records so far
[2021-08-27 13:53:53,070] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Column_Description_DESCRIPTION.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Table_Description_DESCRIPTION.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Description_4.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Cluster_Schema_SCHEMA.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Database_Cluster_CLUSTER.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Table_Column_COLUMN.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Schema_Table_TABLE.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Schema_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Cluster_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Database_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Column_1.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Table_0.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:159} INFO - Publishing Node csv files ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv'], and Relation CSV files ['/var/tmp/amundsen/table_metadata/relationships/Table_Column_COLUMN.csv', '/var/tmp/amundsen/table_metadata/relationships/Column_Description_DESCRIPTION.csv', '/var/tmp/amundsen/table_metadata/relationships/Database_Cluster_CLUSTER.csv', '/var/tmp/amundsen/table_metadata/relationships/Cluster_Schema_SCHEMA.csv', '/var/tmp/amundsen/table_metadata/relationships/Table_Description_DESCRIPTION.csv', '/var/tmp/amundsen/table_metadata/relationships/Schema_Table_TABLE.csv']
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:182} INFO - Creating indices using Node files: ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv']
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,137] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Schema if not exist: 
            CREATE CONSTRAINT ON (node:Schema) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,196] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,196] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,221] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Table if not exist: 
            CREATE CONSTRAINT ON (node:Table) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,223] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,223] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,226] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Cluster if not exist: 
            CREATE CONSTRAINT ON (node:Cluster) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,227] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,227] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,767] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Column if not exist: 
            CREATE CONSTRAINT ON (node:Column) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,787] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,787] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,791] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Database if not exist: 
            CREATE CONSTRAINT ON (node:Database) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,792] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,792] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,795] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Description if not exist: 
            CREATE CONSTRAINT ON (node:Description) ASSERT node.key IS UNIQUE
        
[2021-08-27 13:53:53,796] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,796] {neo4j_csv_publisher.py:186} INFO - Publishing Node files: ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv']

EDIT : Is this because of late binding views only being added and not normal views?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
delwatermancommented, Aug 31, 2021

@raajpackt I think I found the bug with the script. See https://github.com/amundsen-io/amundsen/pull/1466

0reactions
stale[bot]commented, Oct 6, 2021

This issue has been automatically closed for inactivity. If you still wish to make these changes, please open a new pull request or reopen this one.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Best Practices - Apache Airflow
Creating a new DAG in Airflow is quite simple. However, there are many things that you need to take care of to ensure...
Read more >
7 Common Errors to Check When Debugging Airflow DAGs
1. Your DAG Isn't Running at the Expected Time · Airflow's Schedule Interval · Use Timetables for Simpler Scheduling · Airflow Time Zones....
Read more >
Dag dependency view is not rendering for Postgres backed ...
Apache Airflow version. 2.2.2. What happened. Dag dependency view is not rendering for Postgres backed Airflow. What you expected to happen.
Read more >
How to use the postgresql in the airflow DAG
Here in this scenario, we are going to schedule a dag file to create a table and insert data into it in PostgreSQL...
Read more >
Airflow not loading dags in /usr/local/airflow/dags
My dag is being loaded but I had the name of the DAG wrong. I was expecting the dag to be named by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found