Schema registry HTTP error responses are cached
See original GitHub issueWe’ve recently been getting spates of tracebacks like the one below in an app of ours that uses Chr.Avro:
[("HResult": -2146233088), ("Message": "System.Net.Http.HttpRequestException: [https://schema-registry.***.com/] GatewayTimeout[https://schema-registry.***.com/] GatewayTimeout -1
at Confluent.SchemaRegistry.RestService.ExecuteOnOneInstanceAsync(Func`1 createRequest)
at Confluent.SchemaRegistry.RestService.RequestAsync[T](String endPoint, HttpMethod method, Object[] jsonBody)
at Confluent.SchemaRegistry.RestService.GetLatestSchemaAsync(String subject)
at Confluent.SchemaRegistry.CachedSchemaRegistryClient.GetLatestSchemaAsync(String subject)
at Chr.Avro.Confluent.AsyncSchemaRegistrySerializer`1.<SerializeAsync>b__24_0(String subject)
at Chr.Avro.Confluent.AsyncSchemaRegistrySerializer`1.SerializeAsync(T data, SerializationContext context)
at Confluent.Kafka.SyncOverAsync.SyncOverAsyncSerializer`1.Serialize(T data, SerializationContext context)
at Confluent.Kafka.Producer`2.Produce(TopicPartition topicPartition, Message`2 message, Action`1 deliveryHandler)"), ...<snip>...
at Confluent.SchemaRegistry.RestService.ExecuteOnOneInstanceAsync(Func`1 createRequest)
at Confluent.SchemaRegistry.RestService.RequestAsync[T](String endPoint, HttpMethod method, Object[] jsonBody)
at Confluent.SchemaRegistry.RestService.GetLatestSchemaAsync(String subject)
at Confluent.SchemaRegistry.CachedSchemaRegistryClient.GetLatestSchemaAsync(String subject)
at Chr.Avro.Confluent.AsyncSchemaRegistrySerializer`1.<SerializeAsync>b__24_0(String subject)
at Chr.Avro.Confluent.AsyncSchemaRegistrySerializer`1.SerializeAsync(T data, SerializationContext context)
at Confluent.Kafka.SyncOverAsync.SyncOverAsyncSerializer`1.Serialize(T data, SerializationContext context)
at Confluent.Kafka.Producer`2.Produce(TopicPartition topicPartition, Message`2 message, Action`1 deliveryHandler)"), ("IsError": True), ("IsLocalError": True), ("IsBrokerError": False)]), ("Type": "Confluent.Kafka.ProduceException`2[[...<snip>...]]")]
After digging a bit, I think what is happening is that our application is–for reasons unrelated to this library or any C# code in general–receiving HTTP 504 responses in some of its initial attempts to contact the schema registry. When this happens, I think Chr.Avro caches this error result since here it is adding a single task to the cache, and after the initial add subsequent hits on the cache are awaiting that same task, leading to the HttpRequestException
being raised on every access.
Of course, solving the 504s is a thing we should work on, but more specific to Chr.Avro: does it sound like I’m reading the code right, there? If so, would it make sense to try to come up with a way of skipping addition to the cache for HTTP 5xx response statuses?
Thank you!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
Man, you are fast! Thank you so much for this.
You’re right, Chr.Avro shouldn’t cache failed requests. I think we still want deduping (there shouldn’t be multiple in-flight requests for the same schema), but it looks like
CachedSchemaRegistryClient
will do that for us.This and #77 are good candidates for a patch release; will try to make that happen yet this week.