question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feast API: On-demand features

See original GitHub issue

Updates

See also: Feast Slack: #feast-feature-transformations See also RFC-021: On-demand Transformations

  • below was original issue motivating above RFC + implementation.
  • Currently, on demand features are in alpha, with the main outstanding task to make more scalable in the batch use case

Archive

1. Overview

Feast currently does not apply any transformations on user data. The expectation is that Feast will be the sourcing, ingestion, and serving layer.

Feature transformations can be grouped according to

  1. Precomputable features: These are transformations that are applied either on real-time streams or batch data at rest, but they are applied prior to ingestion into Feast and into stores. Thus, they are a pre-computational step.
  2. On-demand features: These are features that cannot be precomputed. Often the data required to apply these feature transformations are only available at the last moment. Examples of these kinds of features are ones that come from a transaction or order (location data).

On-demand features are a use case that Feast should support, otherwise teams would need to develop custom logic in both their training and serving systems to apply these transformations.

2. Use Case: Trip Level Features

Imagine we need to make a prediction based on a trip with the following data model

Trip 
- trip_id
- customer_id
- origin_latitude
- origin_longitude
- dest_latitude
- dest_longitude

A trip is created when the user sends a request to start a trip to the Trip Service. The Trip Service needs to respond within 100 ms to the user to confirm the creation of the trip. The Trip Service is always pushing data to a stream, so when a new trip is created it is instantly pushed as an event to the stream, after which the Trip Service finishes its logic and responds to the user.

image

The Trip Service is also powered by an ML model in the Trip Model Service. This is one of the steps the Trip Service does before responding to the user. It’s not important what this ML model does. The Trip Service sends a request to the Trip Model Service. This model takes in both customer features and trip level features in order to make a prediction. The entity types are thus trip and customer.

One of the features the model wants is the straight line distance between origin and destination. This can easily be calculated with a function like distance(origin_latitude, origin_longitude, dest_latitude, dest_longitude)

The Trip Model Service then sends a feature request to Feast Serving and asks for both features on both the customer id and trip id, expecting the distance feature above to have been precalculated.

The store won’t contain the trip id features that the model is looking for, only the customer id features will be there. In fact the trip_id won’t even exist in the online store.

This is because the round trip through the stream as well as through stream processing means that (1) will always happen faster than (2). Meaning the transactional synchronous systems will reach the feature store for feature lookups faster than the stream processing and ingestion can populate the store with precomputed features.

3. Derived features

One solution to the above problem would be to introduce feature transformations that happen just in time in the online serving layer of Feast. These transformations would then take precomputed features, or data provided in the incoming request from the transactional system, and derive new features based on either predefined or user defined functions. The resulting features can then be provided to the model.

4. Requirements

Online to offline consistency: A key design decision would be to ensure consistency of these transformations between online serving and historical serving. One approach would be to have derived features as a final stage prior to serving (either in historical or online).

Client language agnositicism: The implementation should ideally be agnostic to the execution environment of the client. Meaning the transformations should function for both online and historical serving no matter which environment languages triggers the call (Python, Golang, Java, or any gRPC client). This requirement probably means that the transformation cannot easily happen client side.

Row vs grouped transformations: There does not seem to be a strong need for column or dataset (grouped) transformations, only instance level transformations (row level).

Development experience: This is not a strict requirement as the ones above, but more of a design goal. It should ideally be both easy and fast for users to develop these derived feature transformations (if we allow UDFs). So the development experience is one we should optimize for.

5. Next Steps

I’m opening this up with a use case / problem statement. There are teams running Feast right now that are looking at implementing this, so it would be great if we can agree on an MVP design.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:8
  • Comments:17 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
woopcommented, Jun 17, 2021

As an update here, we are currently working on an RFC for on-demand transformations.

0reactions
stale[bot]commented, Sep 25, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Alpha] On demand feature view - Feast
On demand feature views allows data scientists to use existing features and request time data (features only available at request time) to transform...
Read more >
Getting started with Feast, an open source feature store ... - AWS
Feast is an open source feature store and a fast, convenient way to serve machine learning (ML) features for training and online inference....
Read more >
On demand feature view using multiple entities - Stack Overflow
I have read the docs on the on demand feature views, but it is not clear if I can combine data from different...
Read more >
Feature Store For ML
The first open-source Feature Store and the first with a DataFrame API. Most data sources (batch/streaming) supported. Ingest features using SQL, Spark, Python, ......
Read more >
How to Use Feast Feature Store for Fintech? - Royal Cyber
learn what Feast is as an open-source feature store, how it serves features in production, operationalizes your analytics data, tracks and retrieves ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found