AI Integration: OTel CosmosDB Attributes to Collect
See original GitHub issueStatus: Experimental Target SDK: .Net SDK
Operations To Cover
Database / Container Operations : Cosmos.CreateDatabaseAsync / Cosmos.CreateContainerAsync / Cosmos.DeleteStreamAsync Point Operations : Cosmos.CreateItemAsync / Cosmos.ReadItemAsync / Cosmos.UpsertItemAsync / Cosmos.ReplaceItemAsync / Cosmos.PatchItemAsync / Cosmos.DeleteItemAsync Stream Operations : Cosmos.CreateItemStreamAsync / Cosmos.UpsertItemStreamAsync / Cosmos.ReadItemStreamAsync / Cosmos.ReplaceItemStreamAsync / Cosmos.PatchItemStreamAsync / Cosmos.DeleteItemStreamAsync Batch Operations : TBD Bulk Operation : TBD Query Operations : Cosmos.Typed FeedIterator ReadNextAsync (for each page)
Attributes
AI Default Attributes:
Attributes | Value |
---|---|
Event time | 3/2/2022, 11:58:04.967 PM (Local time) |
Duration | 278.0 ms |
Name | Cosmos.DeleteStreamAsync |
Other default attributes are there e.g device details
Proposal For .Net Cosmos DB SDK
Custom Attributes:
Azure Specific Attributes
Attribute | Value | Comment |
---|---|---|
kind | client | IGNORE, By Default, setting them as part of diagnostic scope |
az.namespace | Microsoft.DocumentDB | IGNORE, By Default, setting them as part of diagnostic scope |
Common Database Attributes
Attribute | Value | Comment |
---|---|---|
db.system | cosmosdb | Open Telemetry Convention To identify type of Db ref. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/database.md#notes-and-well-known-identifiers-for-dbsystem |
db.name | < Database Name > | |
db.operation | ReadItemAsync, DeleteItemStreamAsync etc | Database Operation ~Type~ Name |
net.peer.name | e.g. sourabhjaintemp | Account Name + Cloud |
Cosmos DB Specific
Account Level Information:
Attribute | Value | Comment |
---|---|---|
db.cosmosdb.client_id | Unique Client Id | Combination of client id and machine id can tell us, if customer is following best practices to create singleton client |
db.cosmosdb.machine_id | Unique Machine Id | |
user_agent.original | < User Agent With SDK version> | Useful to identify the SDK version |
db.cosmosdb.connection_mode | Direct/Gateway | go through |
Request Level Information:
Attribute | Value | Comment |
---|---|---|
db.cosmosdb.container | Container | |
db.cosmosdb.request_content_length_bytes | Size of request payload | |
db.cosmosdb.response_content_length_bytes | Size of response Payload | |
db.cosmosdb.status_code | 201/200/204 | Cosmos Db Http Status Code, it tells if particular cosmosdb call/request is passed/failed with which HttpStatusCode |
db.cosmosdb.sub_status_code | 1000/1002 | Cosmos Db SubStatus Code |
db.cosmosdb.request_charge | < double type number > | RU consumed for that operation |
db.cosmosdb.regions_contacted | Region Cosmos Db | |
db.cosmosdb.retry_count | Number of retries | |
db.cosmosdb.operation_type | Query/Read/Create | |
db.cosmosdb.item_count | < int number> | Number of items returned by the operation, only Feed Operation |
db.cosmosdb.request_diagnostics | < JSON String> | ~Open Question: What if this string is out of the limit size? will appinsight will break it and divide into different attributes?~ Generating it as event |
db.cosmosdb.activity_id | Guid | Unique Id for the operation and can be helpful to debug particular operation in backend logs |
db.cosmosdb.correlated_activity_id | Guid | It will be populated only in case of query operation to allow correlating query pages retrieved for the same multi-page or cross-partition query. |
db.cosmosdb.batch_operations | string | Comma separated list of operation type and count, only for batch operations |
Open Telemetry Standard for any Exception
Attribute | Value | Comment |
---|---|---|
exception.type | java.net.ConnectException; ``OSError |
ref. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/exceptions.md |
exception.message | Division by zero; Can't convert 'int' object to str implicitly |
ref. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/exceptions.md |
exception.stacktrace | Exception in thread "main" java.lang.RuntimeException: Test exception\n at com.example.GenerateTrace.methodB(GenerateTrace.java:13)\n at com.example.GenerateTrace.methodA(GenerateTrace.java:9)\n at com.example.GenerateTrace.main(GenerateTrace.java:5) |
ref. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/exceptions.md |
Following Open Telemetry Conventions (Note: Status of below conventions is Experimental): ~~1. https://github.com/Azure/azure-sdk/blob/main/docs/tracing/distributed-tracing-conventions.yml~~ 2. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/database.md 3. https://opentelemetry.io/docs/reference/specification/common/attribute-naming/
Issue Analytics
- State:
- Created 2 years ago
- Comments:22 (22 by maintainers)
Top GitHub Comments
I am totally fine with adding a prefix/suffix for cloud - just want to avoid that anyone would think this is a real network endpoint
Agreed as long as there is really an easy way to get containerId/nodeId from what is getting logged. I can just tell form past experience that neither on Azure Functions (automatic logging) nor App Services (manual logging) there has been an easy way for customers to find this info when they logged to AppInsights. I am happy to test this if someone from AzureMonitor can provide the Kust query or similar that should reliably resolve nodeId/containerId from the attributes in the base schema. If that works, we can definitely drop the machine_Id attribute here.