ALPaCA: Abstract Light-weight Producer and Consumer API
See original GitHub issueContext
The consumer/producer API model currently supported by Stream Registry follows a factory pattern. This creates enriched clients that are closely coupled to the underlying streaming platform (Kafka, Kinesis, etc.) Benefits of this approach include:
- allowing full access to the capabilities of the target streaming platform
- avoiding a ‘lowest common denominator’ approach whereby a client can expose only those features supported by all target streaming platforms
- easing adoption of Stream Registry by existing applications as the client interface remains unchanged
- enables use of tightly integrated technologies such as KStreams, KSQL, and other mature 3rd party integrations such as Apache Spark’s Kafka receiver or the Apache Flink Kinesis connectors
However, this approach does little to simplify scenarios where we must integrate multiple diverse streaming platforms (Kafka, Kinesis, etc.) with consuming or producing applications. This capability is of significant value when building core streaming data applications that must act on a majority or all or an organisations streams. Typically these could include: anomaly detection, data quality, and potentially many others.
I propose therefore that it would be beneficial to also include in Stream Registry a set of stream platform agnostic consumer/producer APIs. These would allow the consumption/production of events in a unified manner so that applications can be built that can simply and seamlessly interact with an organisations streams, irrespective of the underlying streaming platform in which they are persisted.
To facilitate wide adoption and integration, APIs would be extremely light-weight and use only open and massively adopted technologies: HTTP, REST, Json, WebSockets, etc.
To be clear: I suggest that these APIs are provided in addition to the factory approach that is already supported, whose differentiating value was outlined earlier.
Desired Behaviour
As a consumer/producer, I can chose to integrate my application/service with the ‘vendor’ agnostic streaming API (ALPaCA). This API provides me with an abstraction to to produce and consumer events while decoupling me from the platform specific API provided by my streaming platform. It thus allows me to target other streaming platforms in my data ecosystem with no additional effort, and seamlessly support new platforms as they are adopted into the stream registry ecosystem.
The stream registry already assists me in understanding the events that I consume/produce via integration with schema registries, it can also provide metadata that describes how I should serialise and deserialise said events. However, it does not provide me with the machinery to do so - the difficult part. ALPaCA would provide simple standardised message encodings (Json, GZIPped), transports (HTTP + WebSockets), and protocols (REST), enabling me to simply and consistently read and write events with any stream, with minimal dependencies and little coupling.
Benefits
- enables the development/deployment of organisation-wide stream-based applications/service/capabilities that can integrate with and operate on any stream in the organisation, irrespective of the underlying streaming platform on which the stream resides
- eliminates the need to integrate each core system with
N
streaming platforms - eliminates duplicate development and maintenance of Kinesis/Kafka/etc. integration code across such systems
- eliminates vendor specific capability gaps forming in the data platform: “we can perform anomaly detection on your Kafka streams, but not your Kinesis streams as we haven’t yet had a chance to build an integration”
- useful internal Stream Registry plumbing: can act as a bridge between streaming platforms - i.e. push from any one supported platform into any other by using the platform agnostic consumer/producer APIs. This solution then works for any combinations of platform, even those adopted in the future
- the relocation of streams from one platform (example: Kinesis) to another (Kafka) does not impact ALPaCA consumers or producers, thus minimising the overall impact of such a migration. This then allows us to think more freely regarding stream placement, and migrate them between systems based on cost, performance, etc.
While benefits have been described, it is important to underline the cases where the use of ALPaCA is disadvantageous. The sweet spot for ALPaCA is any producing/consuming application intended to be applied to many streams and across multiple streaming platforms - so really core data platform capabilities. It is not a good fit in cases where only one platform is targeted, and where excellent mature integrations already exist.
Comparable technologies
- JDBC: Unified connectivity and interoperability with disparate RDBMSes
- Data Access Layer pattern
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:10 (9 by maintainers)
Top GitHub Comments
As an aconym, possibly okay, but aloud, conflicts with Akka’s Alpakka
Would ❤️ to see a top level repo that uses this signature:
https://github.com/HotelsDotCom/data-highway/blob/86704be5d268b8e898959bb1c8fe9cff6ab84fc0/client/onramp/src/main/java/com/hotels/road/client/AsyncRoadClient.java#L28-L30