Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Factor out storage key generation logic, to eliminate code duplication

See original GitHub issue

This is a small-ish refactoring chore. See https://github.com/gojek/feast/pull/360#discussion_r362778747

Currently there is some code duplication between ingestion and serving for the generation and parsing of Redis keys. We multiplied this in the course of implementing Cassandra storage (#360), as it is already a large change set we didn’t want to introduce additional Redis refactoring to it and thus decided to log it as a follow-up.

This could be done for Redis independent of and before integrating the Cassandra implementation though, and Cass implementation can be updated to follow the pattern if so.

Update: The below storage modularization was realized through #529 and subsequent PRs that implemented its interfaces. This issue remains for tech debt of code duplication that still remains, enumerated in https://github.com/gojek/feast/issues/402#issuecomment-623264345

~~In the interest of storage modularization to minimize dependency headaches in the future (something I believe we’ll hash out in further RFC issues), I propose something like this:~~

~~1. A storage-api module, which defines an interface something like KeyUtil~~ ~~1. A storage-redis implementation module which implements KeyUtil for Redis~~

~~I’m open to better naming suggestions, structure is more my focus here.~~

~~The KeyUtils could ideally be static so that unit testing them is trivially simple and the unit tests can serve as a good spec/documentation of the key format.~~

Issue Analytics

State:
Created 4 years ago
Comments:10 (8 by maintainers)

Top GitHub Comments

3reactions

zhilingccommented, Mar 9, 2020

@ches @woop I’m thinking of giving this a go sometime this week, hopefully with your blessing, haha. After looking around, I’m partial to flink’s implementation for multi-connector support, and propose something like this:

.
├── core
├── ingestion
├── serving
└── storage
    ├── api // interfaces, not abstract classes
    │   ├── BatchRetriever.java
    │   ├── FeatureStorage.java
    │   └── OnlineRetriever.java
    ├── common // Utils common across all stores that implementations may or may not use
    │   └── retry // existing retry code
    │       ├── BackOffExecutor.java
    │       └── Retriable.java
    └── connectors // implementations
        ├── bigquery
        ├── cassandra
        └── redis
            ├── RedisFeatureStorage.java
            ├── RedisKey.java
            └── RedisOnlineRetriever.java

I’m thinking of keeping the KeyUtil separate from the API since it’s specific to KV stores and not really a general need, so I don’t think Feast should be opinionated about their implementation. Connectors are free to share a single KeyUtil across their Storage and Retriever implementations.

The actual methods defined in the interfaces require further thought, but just throwing this out for comments first.

2reactions

chescommented, Mar 9, 2020

LGTM!

I’m thinking of keeping the KeyUtil separate from the API since it’s specific to KV stores and not really a general need, so I don’t think Feast should be opinionated about their implementation.

True, though I think it’s fine to have interfaces in api that not every implementation needs, but on the other hand should be conservative about promoting things there until their abstractions have some exercise.

Hypothetical KeyUtil might not end up being a good case though anyway, among other things I had thought they’d ideally be static and there’s really no way to have static interfaces in Java, so maybe that’s a clue it’s against the grain… (not to mention “util” usually makes me feel a little ashamed of myself 😅).

Connectors are free to share a single KeyUtil across their Storage and Retriever implementations.

Definitely intended this to be true though, as the original spirit of this issue.

Top Results From Across the Web

The key to eliminating duplicated code patterns - Medium

As structural code duplication is usually obvious and easy to eliminate, semantic duplication is often left behind due to the lack of ideas...

Is it appropriate to use inheritance to prevent code duplication ...

Is it appropriate to use inheritance to prevent code duplication of the logic for a user control? · Change each UserControl to inherit...

How can I factor out the code duplication here? - Stack Overflow

I have a project where three different inheritance paths need to all implement another base class. This would be multiple inheritance and isn't ......

Data deduplication - Wikipedia

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve ...

What is Data Deduplication? | Key Concepts, Use Cases ...

Data Deduplication eliminates duplicate data blocks and stores only unique data blocks at the 4KB block level within a FlexVol volume and ...