Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ALPaCA: Abstract Light-weight Producer and Consumer API

See original GitHub issue

Context

The consumer/producer API model currently supported by Stream Registry follows a factory pattern. This creates enriched clients that are closely coupled to the underlying streaming platform (Kafka, Kinesis, etc.) Benefits of this approach include:

allowing full access to the capabilities of the target streaming platform
avoiding a ‘lowest common denominator’ approach whereby a client can expose only those features supported by all target streaming platforms
easing adoption of Stream Registry by existing applications as the client interface remains unchanged
enables use of tightly integrated technologies such as KStreams, KSQL, and other mature 3rd party integrations such as Apache Spark’s Kafka receiver or the Apache Flink Kinesis connectors

However, this approach does little to simplify scenarios where we must integrate multiple diverse streaming platforms (Kafka, Kinesis, etc.) with consuming or producing applications. This capability is of significant value when building core streaming data applications that must act on a majority or all or an organisations streams. Typically these could include: anomaly detection, data quality, and potentially many others.

I propose therefore that it would be beneficial to also include in Stream Registry a set of stream platform agnostic consumer/producer APIs. These would allow the consumption/production of events in a unified manner so that applications can be built that can simply and seamlessly interact with an organisations streams, irrespective of the underlying streaming platform in which they are persisted.

To facilitate wide adoption and integration, APIs would be extremely light-weight and use only open and massively adopted technologies: HTTP, REST, Json, WebSockets, etc.

To be clear: I suggest that these APIs are provided in addition to the factory approach that is already supported, whose differentiating value was outlined earlier.

Desired Behaviour

As a consumer/producer, I can chose to integrate my application/service with the ‘vendor’ agnostic streaming API (ALPaCA). This API provides me with an abstraction to to produce and consumer events while decoupling me from the platform specific API provided by my streaming platform. It thus allows me to target other streaming platforms in my data ecosystem with no additional effort, and seamlessly support new platforms as they are adopted into the stream registry ecosystem.

The stream registry already assists me in understanding the events that I consume/produce via integration with schema registries, it can also provide metadata that describes how I should serialise and deserialise said events. However, it does not provide me with the machinery to do so - the difficult part. ALPaCA would provide simple standardised message encodings (Json, GZIPped), transports (HTTP + WebSockets), and protocols (REST), enabling me to simply and consistently read and write events with any stream, with minimal dependencies and little coupling.

Benefits

enables the development/deployment of organisation-wide stream-based applications/service/capabilities that can integrate with and operate on any stream in the organisation, irrespective of the underlying streaming platform on which the stream resides
eliminates the need to integrate each core system with N streaming platforms
eliminates duplicate development and maintenance of Kinesis/Kafka/etc. integration code across such systems
eliminates vendor specific capability gaps forming in the data platform: “we can perform anomaly detection on your Kafka streams, but not your Kinesis streams as we haven’t yet had a chance to build an integration”
useful internal Stream Registry plumbing: can act as a bridge between streaming platforms - i.e. push from any one supported platform into any other by using the platform agnostic consumer/producer APIs. This solution then works for any combinations of platform, even those adopted in the future
the relocation of streams from one platform (example: Kinesis) to another (Kafka) does not impact ALPaCA consumers or producers, thus minimising the overall impact of such a migration. This then allows us to think more freely regarding stream placement, and migrate them between systems based on cost, performance, etc.

While benefits have been described, it is important to underline the cases where the use of ALPaCA is disadvantageous. The sweet spot for ALPaCA is any producing/consuming application intended to be applied to many streams and across multiple streaming platforms - so really core data platform capabilities. It is not a good fit in cases where only one platform is targeted, and where excellent mature integrations already exist.

Comparable technologies

JDBC: Unified connectivity and interoperability with disparate RDBMSes
Data Access Layer pattern

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:10 (9 by maintainers)

Top GitHub Comments

1reaction

OneCricketeercommented, Dec 12, 2018

As an aconym, possibly okay, but aloud, conflicts with Akka’s Alpakka

0reactions

neowordcommented, Feb 5, 2019

Would ❤️ to see a top level repo that uses this signature:

https://github.com/HotelsDotCom/data-highway/blob/86704be5d268b8e898959bb1c8fe9cff6ab84fc0/client/onramp/src/main/java/com/hotels/road/client/AsyncRoadClient.java#L28-L30

Top Results From Across the Web

Alpaca - Developer-First API for Stocks and Crypto

Alpaca's easy to use APIs allow developers and businesses to build apps, embed investing, and trade algorithms.

Kafka 3.3 Documentation

The Admin API to manage and inspect topics, brokers, and other Kafka objects. The Producer API to publish (write) a stream of events...

Streams Concepts | Confluent Documentation

A Kafka cluster consists of one or more brokers. An application that uses the Kafka Streams API acts as both a producer and...

Q:\data\wpdocs\ATPA2005\Final\Final Design.wpd - USITC

ABSTRACT. The submission of this study to Congress continues a series of annual ... Andean Trade Preference Act (ATPA) on U.S. industries and...

Mandala #18 - Mandala Madness - Art, Abstract, Soul, Color, Life ...

... pixls official cheezewizard vegas extra dimensions exist bodhisattvas stryking alpaca 0xuniverse live apymon apeinfinance shop josie rude trevorjonesart ...