[Internal] Client Telemetry Public API Proposal
See original GitHub issueBackground
As per latest discussion, to get align with Java API and avoid confusion with feature name. We decided to use “Client Telemetry” as a parent feature under which there will be 2 kind of telemetry covered as of now: 1. Send Client Diagnostic Metrics To Service : Sending telemetry metrics to Microsoft.
Read more about it here: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/docs/observability.md#send-telemetry-from-sdk-to-service-private-preview
2. Distributed Tracing: Sending Activities with operation level or network level information to customer’s APM tool like AppInsights (May be in future metrics will also be covered).
Read more about it here: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/docs/observability.md#distributed-tracing-preview
We are fine if Java API and .Net API doesn’t look exactly similar. Here idea is to make sure, it should not confuse customer.
Java Public API looks like this:
CosmosClientBuilder
CosmosClientTelemetryConfig
CosmosDiagnosticsThresholds
Proposed Public API in .Net SDK to control above features:
By default, both of the above features will be enabled in SDK.
Why?
1. Send Client Diagnostic Metrics To Service: It is controlled by portal hence enabling it in SDK, by default, will give seamless experience to the customer where they can just enable it from portal and we can see data flowing to us without any code change. There won’t be any perf impact/overhead on sdk, if it is disabled on portal and enabled on SDK as Portal setting has higher priority. 2. Distributed Tracing: It works on subscription model, until unless there is no subscriber, it won’t generate any activity. Hence no overhead. This feature flag is introduced to support appinsight SDK as customer cannot de-subscribe any activity source there.
Want to see distributed tracing in action? OpenTelemetry Way: https://github.com/Azure/azure-cosmos-dotnet-v3/tree/master/Microsoft.Azure.Cosmos.Samples/Usage/OpenTelemetry AppInsight SDK Way: https://github.com/Azure/azure-cosmos-dotnet-v3/tree/master/Microsoft.Azure.Cosmos.Samples/Usage/ApplicationInsights
In both the cases, there is a scenario where customer would want to disable individual telemetry from SDK.
Public APIs
CosmosClientBuilder
Class Name | Method Name | Return Type | Comment |
---|---|---|---|
CosmosClientBuilder | WithClientTelemetryOptions(CosmosClientTelemetryOptions options) | Returns the client telemetry config instance for this builder | Single function to control/configure any kind of telemetry |
CosmosClientOptions
Class Name | Method Name | Return Type | Comment |
---|---|---|---|
CosmosClientOptions | CosmosClientTelemetryOptions | Set ClientTelemetry Options | Option to control/configure any kind of telemetry |
CosmosClientTelemetryOptions
Class Name | Method Name | Return Type | Comment |
---|---|---|---|
CosmosClientTelemetryOptions | DisableSendingMetricsToService() | void |
Disable sending telemetry data to the service i.e. Microsoft |
CosmosClientTelemetryOptions | DisableDistributedTracing() PREVIEW |
void |
Disable Distributed Tracing feature, it means it will stop generating activities even if there are subscribers |
CosmosClientTelemetryOptions | CosmosThresholdOptions(CosmosThresholdOptions options) PREVIEW |
void |
Options to configure threshold for distributed tracing (Later we can use this same config for opentelemetry metrics, similar to Java) |
CosmosThresholdOptions
Class Name | Method Name | Return Type | Comment |
---|---|---|---|
CosmosThresholdOptions | NonPointOperationLatencyThreshold(TimeSpan span) PREVIEW |
void |
Set when LatencyOverThrehold event with diagnostic string should be created for non-point operations |
CosmosThresholdOptions | PointOperationLatencyThreshold(TimeSpan span) PREVIEW |
void |
Set when LatencyOverThrehold event with diagnostic string should be created for point operations |
Issue Analytics
- State:
- Created 2 months ago
- Comments:13 (13 by maintainers)
Top GitHub Comments
Please check whether you really want to go down the road of putting the latency thresholds behind a class named explicity for tracing only
In Java DiagnosticThresholds have more meaning than for tracing - and I think it has proven to be very useful
So, main ask is to check whether you forsee any ask to also use the same thresholds for metrics - and of course whether .Net needs mroe grnularity on thresholds than just latency (java allows customizing whether to consider a threshold violation based on StatusCode+SubStatusCode), RU-usage, payload size and latency
I recommend
DisableSendingMetricsToService()
instead ofDisableClientTelemetryToService()
to avoid overloading “Client Telemetry”. This also aligns better with the proposed feature name of “Send Client Diagnostic Metrics To Service”