Feast API: On-demand features
See original GitHub issueUpdates
See also: Feast Slack: #feast-feature-transformations See also RFC-021: On-demand Transformations
- below was original issue motivating above RFC + implementation.
- Currently, on demand features are in alpha, with the main outstanding task to make more scalable in the batch use case
Archive
1. Overview
Feast currently does not apply any transformations on user data. The expectation is that Feast will be the sourcing, ingestion, and serving layer.
Feature transformations can be grouped according to
- Precomputable features: These are transformations that are applied either on real-time streams or batch data at rest, but they are applied prior to ingestion into Feast and into stores. Thus, they are a pre-computational step.
- On-demand features: These are features that cannot be precomputed. Often the data required to apply these feature transformations are only available at the last moment. Examples of these kinds of features are ones that come from a transaction or order (location data).
On-demand features are a use case that Feast should support, otherwise teams would need to develop custom logic in both their training and serving systems to apply these transformations.
2. Use Case: Trip Level Features
Imagine we need to make a prediction based on a trip with the following data model
Trip
- trip_id
- customer_id
- origin_latitude
- origin_longitude
- dest_latitude
- dest_longitude
A trip is created when the user sends a request to start a trip to the Trip Service
. The Trip Service
needs to respond within 100 ms to the user to confirm the creation of the trip. The Trip Service
is always pushing data to a stream, so when a new trip is created it is instantly pushed as an event to the stream, after which the Trip Service
finishes its logic and responds to the user.
The Trip Service
is also powered by an ML model in the Trip Model Service
. This is one of the steps the Trip Service
does before responding to the user. It’s not important what this ML model does. The Trip Service
sends a request to the Trip Model Service
. This model takes in both customer
features and trip
level features in order to make a prediction. The entity types are thus trip
and customer
.
One of the features the model wants is the straight line distance between origin and destination. This can easily be calculated with a function like distance(origin_latitude, origin_longitude, dest_latitude, dest_longitude)
The Trip Model Service
then sends a feature request to Feast Serving
and asks for both features on both the customer id
and trip id
, expecting the distance
feature above to have been precalculated.
The store won’t contain the trip id
features that the model is looking for, only the customer id
features will be there. In fact the trip_id
won’t even exist in the online store.
This is because the round trip through the stream as well as through stream processing means that (1)
will always happen faster than (2)
. Meaning the transactional synchronous systems will reach the feature store for feature lookups faster than the stream processing and ingestion can populate the store with precomputed features.
3. Derived features
One solution to the above problem would be to introduce feature transformations that happen just in time in the online serving layer of Feast. These transformations would then take precomputed features, or data provided in the incoming request from the transactional system, and derive new features based on either predefined or user defined functions. The resulting features can then be provided to the model.
4. Requirements
Online to offline consistency: A key design decision would be to ensure consistency of these transformations between online serving and historical serving. One approach would be to have derived features as a final stage prior to serving (either in historical or online).
Client language agnositicism: The implementation should ideally be agnostic to the execution environment of the client. Meaning the transformations should function for both online and historical serving no matter which environment languages triggers the call (Python, Golang, Java, or any gRPC client). This requirement probably means that the transformation cannot easily happen client side.
Row vs grouped transformations: There does not seem to be a strong need for column or dataset (grouped) transformations, only instance level transformations (row level).
Development experience: This is not a strict requirement as the ones above, but more of a design goal. It should ideally be both easy and fast for users to develop these derived feature transformations (if we allow UDFs). So the development experience is one we should optimize for.
5. Next Steps
I’m opening this up with a use case / problem statement. There are teams running Feast right now that are looking at implementing this, so it would be great if we can agree on an MVP design.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:8
- Comments:17 (5 by maintainers)
Top GitHub Comments
As an update here, we are currently working on an RFC for on-demand transformations.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.