question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deprecate support for SQL queries as part of data sources

See original GitHub issue

In the current Feast API it is possible for users to specify either a table reference or query as part of data sources

Query

BigQuerySource(
    query="SELECT timestamp as ts, created, f1, f2 FROM `my_project.my_dataset.my_features`",
)

Table Ref

BigQuerySource(
    table_ref="my_project.my_dataset.my_features"
)

The motivation for supporting query was to allow users to manipulate source data prior to reading it into Feast during materialization or training dataset building. The assumption is that not all users own their sources, so they may not be able to make a change to the table schema. Or they may not have permissions to create views in their offline store.

However, we are unsure whether this assumption holds up. It seems like most users only need

  1. Field mapping
  2. Column projection
  3. Filtering of the source rows

Supporting query comes with major downsides

  1. It’s wasteful. Feast has to execute the full query in order to do simple operations like get metadata (column names/types)
  2. Feast is unable to optimize the underlying query and is forced to execute it as-is, sometimes repeatedly. This can lead to increased costs for users.
  3. It requires the Feast team to maintain two separate code paths for functionality that is almost identical.
  4. It makes the Feast API more complicated. We need to explain to users that they should use table ref if they want more optimized queries.

All of the required functionality above can be added as configuration options as part of table ref, which would free us up from having to support two separate means of querying offline data. I’m hoping to hear from users whether continuing to support query is important, or whether we can try to support similar functionality with table ref.

cc @MattDelac @mavysavydav

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:2
  • Comments:13 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
mavysavydavcommented, Jul 7, 2021

To confirm – Would the 3 pieces of functionality (Field mapping, Column projection, Filtering of the source rows) be supported before this query is deprecated?

1reaction
woopcommented, Jul 6, 2021

Hi guys, in my case that I will expose you below.

stock_source = BigQuerySource(
    query="SELECT TIMESTAMP(`Date`) as event_timestamp, * "
    "FROM `cool_project.happy_dataset.stocks`"
)

The column Date type is ‘DATE’ and I need to map to ‘TIMESTAMP’.

How can I do it with column mapping?

Good point. In your case you need type casting. I’m leaning towards asking you to create a BigQuery view. Would that be reasonable?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deprecated Database Engine Features - SQL Server 2016
Category Deprecated feature Replacement Backup and restore BACKUP TO TAPE BACKUP TO device_that_is_a_tape BACKUP TO D... Backup and restore sp_addumpdevice'tape' sp_addumpde... Backup and restore sp_helpdevice sys.backup_d......
Read more >
[NEWS] Deprecation of Support Announcement for MySQL ...
Please note that starting with Remote Desktop Manager 2022.1, we are deprecating support for MySQL and MariaDB data sources.
Read more >
Data sources | Grafana documentation
Data sources Grafana can query and integrate with many different types of databases. This is done by adding a data source of the...
Read more >
SQL reference for query expressions used in ArcGIS
A reference list is provided for the elements used in SQL queries in ArcGIS.
Read more >
Filter Data from Your Data Source - Tableau Help
Furthermore, users who query published data source will never be able to see or modify any data source filters present on the originally...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found