question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature: Commonly Joined Against Tables

See original GitHub issue

The high-level objective of this feature is to provide visibility into which tables are commonly joined against each other.

My current mental approach for implementing this (for SQL based databases at-least) is as follows:

  1. Obtain SQL Query log for DB in question. ex: For Redshift we can query STL_QUERY_LOG, Postgres Logs can be enabled by setting log_statement to true and then ingested

  2. Use a SQL parsing library like moz-sql-parser (https://github.com/mozilla/moz-sql-parser) to parse queries into a more usable form. For example:

Query

SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;

Parsed Output:

{"select": [{"value": "Orders.OrderID"}, {"value": "Customers.CustomerName"}, {"value": "Orders.OrderDate"}], "from": ["Orders", {"inner join": "Customers", "on": {"eq": ["Orders.CustomerID", "Customers.CustomerID"]}}]}
  1. Analyze parsed output – We can then check the parsed data for joins Ex: Iterate and check the “from” clauses, look for any join clauses, update metadata

This could potentially be extended to pull more insights from queries, like most commonly queried column etc.

Note: All of this is just off the top of my head. Thoughts and ideas are welcome – would love to get working on this and merge it upstream 😃

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
markgrovercommented, Dec 10, 2019

Agreed queries will be helpful and that they can come later.

0reactions
markgrovercommented, Dec 12, 2019

Sorry didn’t realize you were waiting on me. Your plan looks great to me, this would be a great contribution. Just let’s sort them based on join frequency and cap it to 5. Let me know if there’s anything else I can help with.

On Thu, Dec 12, 2019, 8:31 AM Nikshep Svn notifications@github.com wrote:

Hey @markgrover https://github.com/markgrover, wondering if you have any updates on this. Would love to get started on this as soon as I can 😃

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lyft/amundsen/issues/199?email_source=notifications&email_token=AANBLCZLNDSDOCJB2SWXILDQYJRNVA5CNFSM4JZCW7D2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGXHN3Y#issuecomment-565081839, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANBLC76E254VTXB3PDJCELQYJRNVANCNFSM4JZCW7DQ .

Read more comments on GitHub >

github_iconTop Results From Across the Web

About joining and relating tables—ArcMap | Documentation
Typically, you'll join a table of data to a layer based on the value of a field that can be found ... Symbolizing...
Read more >
A SQL join on multiple tables: overview and implementation
SQL join multiple tables is one of the most popular types of statements executed while handling relational databases.
Read more >
Work with feature tables | Databricks on AWS
The basic steps to creating a feature table are: Write the Python functions to compute the features. The output of each function should...
Read more >
Tables and Views for Common Features - Oracle Help Center
Oracle Fusion Cloud Applications. Tables and Views for Common Features. F61430-01. 22D. Title and Copyright Information.
Read more >
Relationships between tables in a Data Model
A relationship is a connection between two tables of data, based on one column in each. ... Unsupported database features in the Excel...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found