Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Latest Only option for Historical Retrieval

See original GitHub issue

Is your feature request related to a problem? Please describe.

In many batch workflows, it is worthwhile to retrieve the latest features by entity only. This is useful from the purposes of both production and backtesting purposes.

E.g. if I have an hourly/daily batch which goes through our whole customer base to find fraudulent customers, we wouldn’t really use the online store for this.

Describe the solution you’d like

Allow users to specify an entity set extracted from a feature view should have an option to be deduplicated by latest. Depends on #1611

my_daily_batch_scoring_df = store.get_latest_features(
    entity_df = "my_df", 
    feature_refs = [...],
)

Additional context Linked issue #1611

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:10 (6 by maintainers)

Top GitHub Comments

2reactions

MattDelaccommented, Nov 21, 2021

I still believe that this is an important feature for batch prediction pipelines. In that case you need the latest values from the offline store.

You also need to keep this idea of an “entity_df” that we don’t have with the pull_latest_from_table_or_query() method

2reactions

woopcommented, Jul 5, 2021

@MattDelac is this API moving closer to what you are using internally?

Not really

But we have the same need for batch predictions where we want to predict the latest values of the features in batch. Therefore we could bypass the historical retrieval logic and have a SQL template that is much more efficient.

In terms of API i would rather have another API eg: store.get_latest_features() rather than a boolean parameter. And as I said, store.get_latest_features() could be a very efficient SQL query

Hope that makes sense

store.get_latest_features() could be a shared method that is also used for materialization into the online store. Seems like a good idea to me.

Top Results From Across the Web

Historical retrieval without an entity dataframe #1611 - GitHub

I was thinking entity key. Only as an option - there are use cases for enabling both of them. For example, if our...

AVEVA™ Historian Retrieval Guide

This guide describes the retrieval modes and options that you can use to retrieve your data. •. AVEVA Historian Database Reference.

Eagle EIE – History Retrieval Optimization

The non-optimized way to retrieve historical data uses static parameters like start date, end date, and index to define data retrieval.

Archive retrieval options - Amazon Simple Storage Service

Bulk – The lowest-cost retrieval option in Amazon S3 Glacier. With bulk retrievals, you can retrieve large amounts, even petabytes, of data inexpensively....

Feature retrieval - Feast

Retrieving historical features (for training data or batch scoring) ... Feast abstracts away point-in-time join complexities with the get_historical_features API.