question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enable passing .sql task results to a following .sql task as parameters

See original GitHub issue

Documentation page: community/index

Use case

I work in a business where we have an abundance of data but not necessarily the best data for data science efforts. In may cases we have millions of time series records for feature building but poor records of the target data we’d like to predict. Further complicating efforts are our many many different database servers of various type (oracle, sql, teradata, aws, azure).

Because of this I spend a lot of time helping our business become ‘data science ready’ by developing data pipelines to build useful datasets. I’m currently using python to integrate and transform disparate data sources and outputting to table

Request

I would like to be able to pass unique values from one .sql task to another as parameters. Something like this:

tasks:
  - name: get-data-1
    source: sql/first_data.sql
    product: '{{data}}/raw/first_data.parquet'
    client: src.clients.src1
    chunksize: null
  
  - name: get-data-2
    source: sql/second-data.sql
    product: '{{data}}/raw/second-data.parquet'
    client: src.clients.src2
    params:
      startYear: '{{startYear}}'
      endYear: '{{endYear}}'      
    upstream: get-data-1
    chunksize: null

second-data-2.sql example

select *
from schema.table
where column in (upstream[params])
and year(date) between {{startYear}} and {{endYear}}

Alternatively, could I output the params via python as an intermediate .yaml and use as inputs like this?

import pandas as pd
import yaml

df = pd.read_parquet(upstream['first_data.parquet''])
vals = df['column'].unique()

d = dict({'val':vals})

with open("sample.yaml", "w") as f:
  yaml.dump(d, f)
  f.close()

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:17 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
edublancascommented, Jan 7, 2022

Hi @rockraptor5 and @reesehopkins,

Yes, it’s possible to have them as tuples. You could do something like this:

def my_param(upstream):
    return tuple(pd.read_parquet(upstream["first"]).x.sum())

Then in SQL:

SELECT * FROM TABLE WHERE x in {{my_param}} 

Or you may pass a list and have jinja format it:

SELECT * FROM TABLE WHERE x in ( {{my_param | join(', ') }} )

I’ll take a stab at this feature, I’ll update so you can test it.

1reaction
edublancascommented, Jan 25, 2022

nice! bear in mind that this is a new feature and hasn’t been released yet, but you can still use it if you install it from git. I need to work on it a bit more, this will be part of the next release. I’ll close this issue when it’s out. Thanks a lot for your feedback!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Execute SQL Task in SSIS: Output Parameters vs Result Sets
Execute SQL Task in SSIS allows user to execute parameterized SQL statement and create mapping between these parameters and the SSIS variables.
Read more >
How to pass variable as a parameter in Execute SQL Task ...
Click the parameter mapping in the left column and add each paramter from your stored proc and map it to your SSIS variable:...
Read more >
Execute SQL Task - SQL Server Integration Services (SSIS)
Parameters in the Execute SQL Task. SQL statements and stored procedures frequently use input parameters, output parameters, and return codes.
Read more >
Execute SQL Task in SSIS
Parameters. Execute SQL Task in SSIS allows user to execute parameterized SQL statement and create mapping between these parameters and the SSIS ...
Read more >
Using the Execute SQL Task to Generate Result Sets
Go to the Parameter Mapping page of the Execute SQL Task Editor. On this page, you map the parameters referenced in your queries...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found