RedshiftMetadataExtractor example sample DAG needed - changing postgres sample DAG not loading views as expected
See original GitHub issueRequesting a databuilder example dag for Redshift.
The RedshiftMetadataExtractor
is present in the list of extracters.
An example dag for sample metadata for redshift would be helpful. The postgres example databuilder dag does not seem to bring in views. Using RedshiftMetadataExtractor
in place of PostgresMetadataExtractor
in the sample postgres dag does not solve this problem. They still do not load the views although running fine.
The following is the trace of a successful dag running in airflow which still does not load views in amundsen somehow.
SELECT
*
FROM (
SELECT
CURRENT_DATABASE() as cluster,
c.table_schema as schema,
c.table_name as name,
pgtd.description as description,
c.column_name as col_name,
c.data_type as col_type,
pgcd.description as col_description,
ordinal_position as col_sort_order
FROM INFORMATION_SCHEMA.COLUMNS c
INNER JOIN
pg_catalog.pg_statio_all_tables as st on c.table_schema=st.schemaname and c.table_name=st.relname
LEFT JOIN
pg_catalog.pg_description pgcd on pgcd.objoid=st.relid and pgcd.objsubid=c.ordinal_position
LEFT JOIN
pg_catalog.pg_description pgtd on pgtd.objoid=st.relid and pgtd.objsubid=0
UNION
SELECT
CURRENT_DATABASE() as cluster,
view_schema as schema,
view_name as name,
NULL as description,
column_name as col_name,
data_type as col_type,
NULL as col_description,
ordinal_position as col_sort_order
FROM
PG_GET_LATE_BINDING_VIEW_COLS()
COLS(view_schema NAME, view_name NAME, column_name NAME, data_type VARCHAR, ordinal_position INT)
UNION
SELECT
CURRENT_DATABASE() AS cluster,
schemaname AS schema,
tablename AS name,
NULL AS description,
columnname AS col_name,
external_type AS col_type,
NULL AS col_description,
columnnum AS col_sort_order
FROM svv_external_columns
)
ORDER by cluster, schema, name, col_sort_order ;
[2021-08-27 13:53:50,582] {task.py:53} INFO - Running a task
[2021-08-27 13:53:50,584] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 0)
[2021-08-27 13:53:50,587] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Column', 1)
[2021-08-27 13:53:50,587] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Database', 2)
[2021-08-27 13:53:50,588] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Cluster', 2)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Schema', 2)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Schema', 'Table', 'TABLE', 3)
[2021-08-27 13:53:50,589] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 'Column', 'COLUMN', 3)
[2021-08-27 13:53:50,590] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Database', 'Cluster', 'CLUSTER', 3)
[2021-08-27 13:53:50,590] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Cluster', 'Schema', 'SCHEMA', 3)
[2021-08-27 13:53:51,338] {task.py:72} INFO - Extracted 500 records so far
[2021-08-27 13:53:51,707] {task.py:72} INFO - Extracted 1000 records so far
[2021-08-27 13:53:51,971] {task.py:72} INFO - Extracted 1500 records so far
[2021-08-27 13:53:52,456] {task.py:72} INFO - Extracted 2000 records so far
[2021-08-27 13:53:52,689] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Description', 4)
[2021-08-27 13:53:52,689] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Table', 'Description', 'DESCRIPTION', 3)
[2021-08-27 13:53:52,745] {task.py:72} INFO - Extracted 2500 records so far
[2021-08-27 13:53:52,917] {file_system_neo4j_csv_loader.py:163} INFO - Creating file for ('Column', 'Description', 'DESCRIPTION', 3)
[2021-08-27 13:53:52,981] {task.py:72} INFO - Extracted 3000 records so far
[2021-08-27 13:53:53,070] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Column_Description_DESCRIPTION.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Table_Description_DESCRIPTION.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Description_4.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Cluster_Schema_SCHEMA.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Database_Cluster_CLUSTER.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Table_Column_COLUMN.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/relationships//Schema_Table_TABLE.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Schema_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Cluster_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Database_2.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Column_1.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,071] {file_system_neo4j_csv_loader.py:170} INFO - Closing file IO <_io.TextIOWrapper name='/var/tmp/amundsen/table_metadata/nodes//Table_0.csv' mode='w' encoding='utf8'>
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:159} INFO - Publishing Node csv files ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv'], and Relation CSV files ['/var/tmp/amundsen/table_metadata/relationships/Table_Column_COLUMN.csv', '/var/tmp/amundsen/table_metadata/relationships/Column_Description_DESCRIPTION.csv', '/var/tmp/amundsen/table_metadata/relationships/Database_Cluster_CLUSTER.csv', '/var/tmp/amundsen/table_metadata/relationships/Cluster_Schema_SCHEMA.csv', '/var/tmp/amundsen/table_metadata/relationships/Table_Description_DESCRIPTION.csv', '/var/tmp/amundsen/table_metadata/relationships/Schema_Table_TABLE.csv']
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:182} INFO - Creating indices using Node files: ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv']
[2021-08-27 13:53:53,125] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,137] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Schema if not exist:
CREATE CONSTRAINT ON (node:Schema) ASSERT node.key IS UNIQUE
[2021-08-27 13:53:53,196] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,196] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,221] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Table if not exist:
CREATE CONSTRAINT ON (node:Table) ASSERT node.key IS UNIQUE
[2021-08-27 13:53:53,223] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,223] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,226] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Cluster if not exist:
CREATE CONSTRAINT ON (node:Cluster) ASSERT node.key IS UNIQUE
[2021-08-27 13:53:53,227] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,227] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,767] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Column if not exist:
CREATE CONSTRAINT ON (node:Column) ASSERT node.key IS UNIQUE
[2021-08-27 13:53:53,787] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,787] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,791] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Database if not exist:
CREATE CONSTRAINT ON (node:Database) ASSERT node.key IS UNIQUE
[2021-08-27 13:53:53,792] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,792] {neo4j_csv_publisher.py:224} INFO - Creating indices. (Existing indices will be ignored)
[2021-08-27 13:53:53,795] {neo4j_csv_publisher.py:460} INFO - Trying to create index for label Description if not exist:
CREATE CONSTRAINT ON (node:Description) ASSERT node.key IS UNIQUE
[2021-08-27 13:53:53,796] {neo4j_csv_publisher.py:233} INFO - Indices have been created.
[2021-08-27 13:53:53,796] {neo4j_csv_publisher.py:186} INFO - Publishing Node files: ['/var/tmp/amundsen/table_metadata/nodes/Schema_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Table_0.csv', '/var/tmp/amundsen/table_metadata/nodes/Cluster_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Column_1.csv', '/var/tmp/amundsen/table_metadata/nodes/Database_2.csv', '/var/tmp/amundsen/table_metadata/nodes/Description_4.csv']
EDIT : Is this because of late binding views only being added and not normal views?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Best Practices - Apache Airflow
Creating a new DAG in Airflow is quite simple. However, there are many things that you need to take care of to ensure...
Read more >7 Common Errors to Check When Debugging Airflow DAGs
1. Your DAG Isn't Running at the Expected Time · Airflow's Schedule Interval · Use Timetables for Simpler Scheduling · Airflow Time Zones....
Read more >Dag dependency view is not rendering for Postgres backed ...
Apache Airflow version. 2.2.2. What happened. Dag dependency view is not rendering for Postgres backed Airflow. What you expected to happen.
Read more >How to use the postgresql in the airflow DAG
Here in this scenario, we are going to schedule a dag file to create a table and insert data into it in PostgreSQL...
Read more >Airflow not loading dags in /usr/local/airflow/dags
My dag is being loaded but I had the name of the DAG wrong. I was expecting the dag to be named by...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@raajpackt I think I found the bug with the script. See https://github.com/amundsen-io/amundsen/pull/1466
This issue has been automatically closed for inactivity. If you still wish to make these changes, please open a new pull request or reopen this one.