Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

1.0 API, the remaining controversial bits

See original GitHub issue

continuing discussion from #656 here:

@AndyPook says:

Got to say, I still really hate the Consume methods and the sync over async stuff with GetAwaiter() bits. We should accept that sync and async are two quite different things and need different treatment. But we’ve been around that loop too many times and I guess I’ll just have to accept that I can’t convince you 😢

One of the reasons I left things as is is I was super keen to get the messy PR tree under control! That’s now done - 1.0.x is the baseline, we’re now discussing diffs against it.

I think we should focus discussion on the producer, as it’s a bit trickier. Let’s say we didn’t have async serializer interfaces to worry about. We still absolutely want ProduceAsync on our producer - we want to await the delivery reports - very important use case. Also, if we’re awaiting network round trips on that why not other methods? i think we also want FlushAsync or any other network blocking call to have async variants (end goal - currently that is out of scope and may require librdkafka changes, or at least more interfacing work). So given that is the end goal, do we really want to introduce another producer just to handle the case of async serializers? I don’t think so. rather i think having an end goal of having async and sync variants [or possibly we may want to depreciate some of the sync variants] of all the methods on the same class if fine, and allows us to just provide the variants we can implement effectively now and leave the rest until later.

I just realized we currently only have BeginProduce. We could add BeginProduceAsync now as we can implement it easily without compromise. This is a point where my reasoning falls down - it’s the only place i can see where you would want an async variant of the method on the async-serializer class, but you wouldn’t on the normal class.

I see what you mean about the ctors and type inference. Maybe we could have a builder style, something like var c = Consumer.Key(…).Value(…).Create(); or var c = Consumer.Key(…).AsyncValue(…).Create() there would aslo be an AsyncKey(…) option, or var c = Consumer.Key<string>().AsyncValue<byte[]>().Create(). That last one would be implying using the “builtins”. I could also imaging extensions like ValueFromSchema<T>() where it would add the appropriate avro/schema deser. Not that hard to create, and just a facade on top of filling in the correct ctor based on what Key/Value options you specified.

not a bad thought, i’ll explore the builder style a bit tomorrow.

I realized earlier today that C# 8 could come to the rescue somewhat - if we enable nullable reference types, we can make only the 2x non-async serializer constructor args nullable (so type inference will work in the main case where I want it to). That is backwards compatible too (can target netstandard1.3 with C# 8), as nullable reference types are implemented as attributes. Only users of > C# 8 will get the benefit, but the number of those will increase over time.

Or, if we had the DeserializerRegistry idea from elsewhere (and removing the dictionary from ConsumerBase), we could just require that the serdes are mandatory. var c = new Consumer<K,V>(DR.Get<string>(), new MyAsyncDeser<V>());

I have a soft spot for having all serdes mandatory, except Null and Ignore, which naturally map to null (it feels a bit funny to me specifying a serializer for null, feels like the API shouldn’t require me to). I plan to remove defaults for the others, as I think it’s not good structurally.

I think a serde registry as you are suggesting above is good structurally, but i think out of scope in the 1.0 release at least (probably forever - users can make that themselves if they want). I think just having the static properties as is the case now is good enough.

I strongly believe that combining sync and async as shown above is not the right way to go. Ditto the sync over async Consume methods. It’s too far away from how I understand async to work, a compromise too far for me. I would much prefer to go the route shown in some of my proposal branches.

explained my reasoning above.

We have very different opinions on what is a good style or correct treatment of async for these things and you strongly believe almost the exact opposite. So it’s hard to reconcile.

To be clear: I don’t like .GetAwaiter(), and it’s completely pointless having IAsyncDeserializer without a ConsumeAsync, except that it allows us to migrate the API to a good state in the future.

We could implement ConsumeAsync now, but it would either block on librdkafka/kafka io, or do that blocking in a Task.Run (probably the former). On reason against implementing it now is people might assume the method is async over both io requests, not just the SR one. If i were using the library I would assume that. But given you’ve been such a good ear to bounce ideas off, I might let it in just for you 😃. But another reason to delay this is there are correctness issues to work through very carefully, i.e. what happens in the case of failure. What gets committed when etc. This has been an issue in the golang client with channels - it’s more obvious there’s a problem in this case, but analogous issues probably exist here. I haven’t thought about this at all, @edenhill just mentioned it as something that needs consideration.

My current place has just taken a big bet on Kafka and we’re a c# shop so we need this. But we couldn’t use these bits as they are.

Why not? Why is it a big deal if your application has to block a bit longer 1 or 2 times on consume because of a request to SR out of however many messages you to plan to consume (hundreds of millions?) why is it a show stopper? you can’t be concerned about blocking, because the consume method is going to block on librdkafka regardless many many more times. do you plan to do something else with the async deserializer where the call is expected to be suspended more than very infrequently? I can’t think when you’d want that. I don’t understand how not having ConsumeAsync has any practical implications - it’ll operate almost identically to Consume in practice. If you can make me understand why this is a show stopper, it would help. The only arguments against I can think of are on principle.

Maybe, a way to collapse some of this by only having async serdes but use a ValueTask<T> return. I haven’t gone too far into this yet. The intent of VT is to have a fast path for async methods that are likely to return synchronously but also handle the cases where full async behavior is necessary. MS are using this a lot to get perf out of the new networking bits on dotnet Core. But we’d need to be sure that it doesn’t force us into not supporting older dotnet framework versions.

Sounds promising! I’m into investigating that! But backwards compatibility to netstandard1.3 is important, and would count it out. Note that an advantage of using the combined IDeserializer idea we’d be able to have three variants and no extra consumer constructor variants. Also, in the future if we wanted yet another variant we’d be able to add it without API breaking changes.

Issue Analytics

State:
Created 5 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

2reactions

mhowlettcommented, Dec 20, 2018

I’m very excited about your builder pattern idea at the moment because I think it probably solves many problems:

extensibility for the serde api beyond the two interfaces currently defined. i’ve been thinking about produce efficiency, and i think we can achieve zero copy produce AND zero perf impact from memory pinning (my guess is 20-30% improvement in throughput), but it’s quite a lot of effort and we can’t do it in the timeframe of the 1.0 release. this opens the door for it later.
There is a problem with OnError and OnLog in that they are relevant in the constructor, but in the current API, they are defined after it. We can hack our way around this by caching events until the handlers are added, but this is ugly. the builder allows this to be avoided.
Serdes don’t need to be be set for Null or Ignore.

something like:

var consumer = ConsumerBuilder<Ignore, string>
    .AddConfig(config)
    // note: no key deserializer specified!
    .AddValueDeserializer(Serializers.UTF8)
    .AddErrorHandler(e => {
        // error handler.
    })
    .AddLogger(l => {
        // logger.
    })
    .AddStatsHandler(s => {
        // statistics
    })
    .Build();

1reaction

mhowlettcommented, Feb 12, 2019

Thanks @Fresa! - returning an IConsumer<TKey,TValue> does indeed sound preferable. Expect we’ll make that change.

Providing a protocol level abstraction is not easy/possible given the current implementation - librdkafka provides high level functions itself. Providing an abstraction which is conceptually this (but with nothing public) gives the primary benefit - allows different high level client type to share connections where this is possible, so it makes a lot of sense to me.