question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enhance support for parameterised SQL query datasets

See original GitHub issue

Description

It would be useful to extend the ability of SQLQueryDatasets to make use of parameterised queries that can be given parameters at runtime.

Context

With the current SQLQueryDataset, parameters can be used in some cases. For example parameters can be passed in from globals.yml

#globals.yml
foo: bar
#catalog.yml

sql:
  type: pandas.SQLQueryDataset
  sql: "SELECT * FROM table WHERE column = ${foo}"

However, when these values are loaded from a yaml file, their string representation of the corresponding python object is used. This is a problem for lists as the following would not produce a valid SQL query

#globals.yml
list:
 - a
 - b
 - c
#catalog.yml

sql:
  type: pandas.SQLQueryDataset
  sql: "SELECT * FROM table WHERE column in ${foo};" # parsed as "SELECT * FROM table WHERE column in ['a','b','c'];"

Possible Implementation

The jinjasql package provides some utilities for parsing templates using jinja syntax.

import jinjasql
def render_template_query(template, **kwargs) -> (str, dict):
    """Renders a query from a sql template.

    Examples
    --------
    Additional keyword arguments are used to fill in parameters
    >>> template = "SELECT * FROM table WHERE column = {{foo}}"
    >>> query, params = render_template_query(template, foo='bar')

    Columns and Tables must be marked as 'sqlsafe' to be parameterised.
    >>> template = "SELECT * FROM {{table | sqlsafe}} WHERE {{col | sqlsafe}} = {{foo}}"
    >>> query, params = render_template_query(template, table='mytable', col='column', foo='bar')

    Collections of values can be used to paramterise in-clauses
    >>> template = "SELECT * FROM table WHERE column in {{values | inclause}}"
    >>> query, params = render_template_query(template, values=['a','b','c'])
    """
    # pyodbc uses qmark syntax
    jinja = jinjasql.JinjaSql(param_style="qmark")
    query, params = jinja.prepare_query(template, kwargs)
    return query, params

This can then be passed into pandas.read_sql_query as follows

import pandas as pd
con = ...
template= "SELECT * FROM table WHERE column in {{values | inclause}}"
query, params = render_template_query(template, values=['a','b','c'])
pd.read_sql_query(query, con=con, params=params)

From a configuration point of view, it might be useful in add a keyword to SQLQueryDataset to make it explicit that the query is a template, rather than a valid SQL string, e.g.

sql:
  type: pandas.SQLQueryDataset
  template: "SELECT * FROM table WHERE column in {{foo | inclause}};" 

I haven’t given much thought yet as to how this could take values from runtime parameters. I think it would require some additional validation, e.g. checking that all the parameters in the template have a value

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
jstammerscommented, Dec 3, 2021

Thanks for the tips. I think you’re suggestion @AntonyMilneQB should work for my particular use-case. I can see the reasoning behind limiting the usage of SQL in Kedro. Perhaps I need to look into dbt and see if that’s more suitable

0reactions
merelchtcommented, Feb 14, 2022

Okay great! I’m glad it’s resolved 🙂

Read more comments on GitHub >

github_iconTop Results From Across the Web

SQL Server Database Parameterization option and its Query ...
The goal of this article is to give details about the database query parameterization feature and explain its effects on query performance.
Read more >
Running parameterized queries | BigQuery - Google Cloud
BigQuery supports query parameters to help prevent SQL injection when queries are constructed using user input. This feature is only available with Google ......
Read more >
Best Practices for SQL Data Sets - Oracle Help Center
Consider the following tips to help you create more efficient SQL data sets: Only Return the Data You ... Avoid Using Group Filters...
Read more >
Using parameterized queries - Amazon Athena
You can use Athena parameterized queries to re-run the same query with different parameter values at execution time and help prevent SQL injection...
Read more >
Parameters | Query and analyze data - Mode Support
NOTE: Parameters are a powerful feature that, by design, allow Mode users to run arbitrary SQL against a database. It is the responsibility...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found