Feature: Commonly Joined Against Tables
See original GitHub issueThe high-level objective of this feature is to provide visibility into which tables are commonly joined against each other.
My current mental approach for implementing this (for SQL based databases at-least) is as follows:
-
Obtain SQL Query log for DB in question. ex: For Redshift we can query
STL_QUERY_LOG
, Postgres Logs can be enabled by settinglog_statement
to true and then ingested -
Use a SQL parsing library like
moz-sql-parser
(https://github.com/mozilla/moz-sql-parser) to parse queries into a more usable form. For example:
Query
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
Parsed Output:
{"select": [{"value": "Orders.OrderID"}, {"value": "Customers.CustomerName"}, {"value": "Orders.OrderDate"}], "from": ["Orders", {"inner join": "Customers", "on": {"eq": ["Orders.CustomerID", "Customers.CustomerID"]}}]}
- Analyze parsed output – We can then check the parsed data for joins Ex: Iterate and check the “from” clauses, look for any join clauses, update metadata
This could potentially be extended to pull more insights from queries, like most commonly queried column etc.
Note: All of this is just off the top of my head. Thoughts and ideas are welcome – would love to get working on this and merge it upstream 😃
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Agreed queries will be helpful and that they can come later.
Sorry didn’t realize you were waiting on me. Your plan looks great to me, this would be a great contribution. Just let’s sort them based on join frequency and cap it to 5. Let me know if there’s anything else I can help with.
On Thu, Dec 12, 2019, 8:31 AM Nikshep Svn notifications@github.com wrote: