Deprecate support for SQL queries as part of data sources
See original GitHub issueIn the current Feast API it is possible for users to specify either a table reference
or query
as part of data sources
Query
BigQuerySource(
query="SELECT timestamp as ts, created, f1, f2 FROM `my_project.my_dataset.my_features`",
)
Table Ref
BigQuerySource(
table_ref="my_project.my_dataset.my_features"
)
The motivation for supporting query
was to allow users to manipulate source data prior to reading it into Feast during materialization or training dataset building. The assumption is that not all users own their sources, so they may not be able to make a change to the table schema. Or they may not have permissions to create views in their offline store.
However, we are unsure whether this assumption holds up. It seems like most users only need
- Field mapping
- Column projection
- Filtering of the source rows
Supporting query
comes with major downsides
- It’s wasteful. Feast has to execute the full query in order to do simple operations like get metadata (column names/types)
- Feast is unable to optimize the underlying query and is forced to execute it as-is, sometimes repeatedly. This can lead to increased costs for users.
- It requires the Feast team to maintain two separate code paths for functionality that is almost identical.
- It makes the Feast API more complicated. We need to explain to users that they should use
table ref
if they want more optimized queries.
All of the required functionality above can be added as configuration options as part of table ref
, which would free us up from having to support two separate means of querying offline data. I’m hoping to hear from users whether continuing to support query
is important, or whether we can try to support similar functionality with table ref
.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:13 (4 by maintainers)
Top GitHub Comments
To confirm – Would the 3 pieces of functionality (Field mapping, Column projection, Filtering of the source rows) be supported before this
query
is deprecated?Good point. In your case you need type casting. I’m leaning towards asking you to create a BigQuery view. Would that be reasonable?