Entity types as a higher-level concept
See original GitHub issueIntroduction
Currently an entity, or more formally an entity type, is treated as a special type of field within a feature set. There has been an attempt to simplify the creation and management of entities and to keep them consistent with features, however some challenges exist with our current approach.
Note: The terms entity and entity type will be used interchangeable in the following issue.
How are entities created?
- Users define an entity as part of a feature set. An entity in this case is a field like any other within the feature set. More than one entity can exist within a feature set.
- An entity’s name must be unique within a feature set.
- There are no constraints on entities outside of a feature set, either at the project or global level. This means that multiple feature sets can define the same entities again.
How are entities used?
- Retrieving feature values: Entities are used as a key for retrieving features. In order to retrieve feature values within a feature set, all entities must be provided as part of the lookup.
- Joining feature sets: In the event that feature values are being retrieved from multiple feature sets, entities are used to look up these feature values. Entities are also used to join across these feature sets to construct a single result set.
What is the problem?
- Discovery: It seems intuitive that users would start their discovery experience from the point of view of an entity type, since their business problem is generally framed around one or more entities. By nesting entities within feature sets and within projects and not providing a discovery means, it makes discovery harder.
- Consistency: Entities are typically consistent across all projects and systems in most organizations. This consistency is not enforced in Feast at the moment. Users are bound to redefine entities in their local projects if no consistency is enforced at an organizational level. Failure would occur when lookups happen or when joins happen across feature sets, especially when joins need to happen across projects.
- Key building: If entities and features maintain mutual compatibility in terms of supported data types, then support must be maintained for building keys from all feature value types. This adds a lot of complexity to key building since support must be maintained to serialize complex composite data structures in order to build these keys.
Proposals
1. Project-level entities
Functionality
- Entities are created outside of feature sets, but they still reside in a specific project namespace.
- Entities have their own distinct API and supported data types (which may be more limited than features)
- Entities must be unique within a project namespace, but can be duplicated across an organization. Uniqueness is ensured through a full entity reference (
gojek/customer
). - Entities are still defined as part of a feature set, but this is a selection process instead of creation.
Advantages
- Entities receive all the sharing and isolation benefits of “projects”. Entities would not have to be treated separately from a logical and/or development standpoint. There would also be no explosion of a global entity namespace
- Users are free to experiment and develop within their projects without affecting other users, since duplication is allowed across projects.
- No need for a central team to gate-keep the creation of entities.
Disadvantages
- By not elevating entities to the global level, end users would be required to know which projects contain the entities they should be referencing. This means an organizational process must exist in order to select these entities.
- Most projects would have to reference entities from another more authoritative project. In fact, it’s likely that an organization will have a central project which contains only entities. This could be a little counter-intuitive if a feature set contains fields that are referencing an external project.
2. Global-level entities
Functionality
- Entities are defined globally for a Feast deployment.
- Entities have their own distinct API and supported data types (which may be more limited than features).
- Entities must be globally unique.
- Entities are still defined as part of a feature set, but this is a selection process instead of creation.
Advantages
- Central authoritative listing of entities within an organization.
- Easier to discover which entities should be used, without needing an organizational policy.
- Easy to reason about and easier to understand when referencing an entity within a feature set.
Disadvantages
- Requires development of separate logic from projects, feature sets, and features.
- Requires a team and process to manage the creation of entities.
- No way to isolate conflicts. If one team wants to use a
float
and another wants to use astring
for an entity data type, then it would likely result in two entities being created. This would still be the case in the Project-level entity proposal, but at least in that proposal the unorthodox approach (maybestring
) could be isolated to a specific project.
3. Default project entities
Functionality
- If a user does not specify a project, then they are automatically located inside of the
default
project. This would be similar to how Kubernetes does namespacing. - All other functionality would be the same as the
project level entities
proposal, except users don’t actually have to create an entity inside of a named project. - Feature references could be created that allow users to reference entities without a project. So instead of having
my_company/customer
, it would be possible to refer to “global” entities by either usingcustomer
ordefault/customer
.
Advantages
- All of the advantages of
project-level entities
. - Most of the advantages of
global-level entities
, except that this default project would still not be a true global namespace. There would still need to be an organizational process that informs users to use the entities in thisproject
. - Simplifies development since
project-level
sharing and isolation can be reused.
Disadvantages
- Still requires access control on the default namespace.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:4
- Comments:13 (6 by maintainers)
Top Results From Across the Web
Entity types as a higher-level concept · Issue #405 · feast ...
Entities have their own distinct API and supported data types (which may be more limited than features). Entities must be globally unique.
Read more >Types of Business Entities
Most business owners will choose from the six most common options: sole proprietorship, general partnership, limited partnership, LLC, C ...
Read more >Generalization vs. Specialization: Definitions and Differences
The common attributes together form a higher-level component called a generalized entity. Two entity types in a university's database, ...
Read more >Generalization, Specialization and Aggregation in ER Model
In specialization, an entity is divided into sub-entities based on their characteristics. It is a top-down approach where higher level entity is ...
Read more >What is specialization and generalization in DBMS?
Specialization is a top-down approach in which a higher-level entity is divided into multiple specialized lower-level entities. In addition to sharing the ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Actually, yeah you are correct, I can just have driver in a global project instead of having the entity defined in each regional project. Too entrenched in the code base that I am currently working on and didn’t consider this possibility.
Its not clear what you mean here. What prevents you from having simply
driver
as a global entity?