Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Internal] Client Telemetry Public API Proposal

See original GitHub issue

Background

As per latest discussion, to get align with Java API and avoid confusion with feature name. We decided to use “Client Telemetry” as a parent feature under which there will be 2 kind of telemetry covered as of now: 1. Send Client Diagnostic Metrics To Service : Sending telemetry metrics to Microsoft.

Read more about it here: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/docs/observability.md#send-telemetry-from-sdk-to-service-private-preview

2. Distributed Tracing: Sending Activities with operation level or network level information to customer’s APM tool like AppInsights (May be in future metrics will also be covered).

Read more about it here: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/docs/observability.md#distributed-tracing-preview

We are fine if Java API and .Net API doesn’t look exactly similar. Here idea is to make sure, it should not confuse customer.

Java Public API looks like this:

CosmosClientBuilder

CosmosClientTelemetryConfig

CosmosDiagnosticsThresholds

Proposed Public API in .Net SDK to control above features:

By default, both of the above features will be enabled in SDK.

Why?

1. Send Client Diagnostic Metrics To Service: It is controlled by portal hence enabling it in SDK, by default, will give seamless experience to the customer where they can just enable it from portal and we can see data flowing to us without any code change. There won’t be any perf impact/overhead on sdk, if it is disabled on portal and enabled on SDK as Portal setting has higher priority. 2. Distributed Tracing: It works on subscription model, until unless there is no subscriber, it won’t generate any activity. Hence no overhead. This feature flag is introduced to support appinsight SDK as customer cannot de-subscribe any activity source there.

Want to see distributed tracing in action? OpenTelemetry Way: https://github.com/Azure/azure-cosmos-dotnet-v3/tree/master/Microsoft.Azure.Cosmos.Samples/Usage/OpenTelemetry AppInsight SDK Way: https://github.com/Azure/azure-cosmos-dotnet-v3/tree/master/Microsoft.Azure.Cosmos.Samples/Usage/ApplicationInsights

In both the cases, there is a scenario where customer would want to disable individual telemetry from SDK.

Public APIs

CosmosClientBuilder

Class Name	Method Name	Return Type	Comment
CosmosClientBuilder	WithClientTelemetryOptions(CosmosClientTelemetryOptions options)	Returns the client telemetry config instance for this builder	Single function to control/configure any kind of telemetry

CosmosClientOptions

Class Name	Method Name	Return Type	Comment
CosmosClientOptions	CosmosClientTelemetryOptions	Set ClientTelemetry Options	Option to control/configure any kind of telemetry

CosmosClientTelemetryOptions

Class Name	Method Name	Return Type	Comment
CosmosClientTelemetryOptions	DisableSendingMetricsToService()	`void`	Disable sending telemetry data to the service i.e. Microsoft
CosmosClientTelemetryOptions	DisableDistributedTracing() `PREVIEW`	`void`	Disable Distributed Tracing feature, it means it will stop generating activities even if there are subscribers
CosmosClientTelemetryOptions	CosmosThresholdOptions(CosmosThresholdOptions options) `PREVIEW`	`void`	Options to configure threshold for distributed tracing (Later we can use this same config for opentelemetry metrics, similar to Java)

CosmosThresholdOptions

Class Name	Method Name	Return Type	Comment
CosmosThresholdOptions	NonPointOperationLatencyThreshold(TimeSpan span) `PREVIEW`	`void`	Set when `LatencyOverThrehold` event with diagnostic string should be created for non-point operations
CosmosThresholdOptions	PointOperationLatencyThreshold(TimeSpan span) `PREVIEW`	`void`	Set when `LatencyOverThrehold` event with diagnostic string should be created for point operations

Issue Analytics

State:
Created 2 months ago
Comments:13 (13 by maintainers)

Top GitHub Comments

2reactions

FabianMeiswinkelcommented, Jul 13, 2023

Please check whether you really want to go down the road of putting the latency thresholds behind a class named explicity for tracing only

In Java DiagnosticThresholds have more meaning than for tracing - and I think it has proven to be very useful

used even to decide whether to emit Request-Level metrics - metrics can have dimensions based on replicaId - allowing to also restrict when to emit these metrics allows to avoid overloading the metric system with metrics having way too high cardinality on dimensions
used for distributed tracing
and used for logging (probably not relevant for .Net)

So, main ask is to check whether you forsee any ask to also use the same thresholds for metrics - and of course whether .Net needs mroe grnularity on thresholds than just latency (java allows customizing whether to consider a threshold violation based on StatusCode+SubStatusCode), RU-usage, payload size and latency

2reactions

jcocchicommented, Jul 13, 2023

I recommend DisableSendingMetricsToService() instead of DisableClientTelemetryToService() to avoid overloading “Client Telemetry”. This also aligns better with the proposed feature name of “Send Client Diagnostic Metrics To Service”

Top Results From Across the Web

OpenTelemetry Enhancement Proposal (OTEP)

The OpenTelemetry OTEP process is intended for changes that are cross-cutting - that is, applicable across languages and implementations - and either introduce ......

An Essential Guide to OpenTelemetry

Start your journey with OpenTelemetry to generate and collect traces, metrics and logs from your system, with this useful tutorial and reference hub....

Application Insights API for custom events and metrics

You can send telemetry from device and desktop apps, web clients, ... Use the Application Insights core telemetry API to send custom events ......

Calling an API proxy with internal-only access | Apigee

This document explains how to call API proxies for target services running on your internal network. Follow these steps if your Apigee organization...

Confluent Telemetry Reporter

The Confluent Telemetry Reporter is a plugin that runs inside each Confluent Platform service to push metadata about the service to Confluent.