[Q] How to aggregate metrics on the client before logging?
See original GitHub issueI currently do the following to try to reduce the amount of syncing to the server (I noticed calling wandb.log without doing this slowed down my training runs by 2x):
def commit()
return random.randint(0, 100) == 0
# each time I call wandb.log
wandb.log({ "step": step }, commit=commit())
The idea is to commit the logs around once every hundred steps. But I don’t think that this is what is actually happening. How would I do what I’m trying to do?
Issue Analytics
- State:
- Created a year ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
[Q] How to aggregate metrics on the client before logging?
I finally managed to make myself a custom aggregated metrics data sorted by steps and log to the wandb server at the end...
Read more >Log-based and pre-aggregated metrics in Application Insights
In Application Insights, the metrics that are based on the query-time aggregation of events and measurements stored in logs are called log-based ...
Read more >Log-based metrics overview - Google Cloud
Distribution metrics · A count of the number of values in the distribution. · The mean of the values. · The sum of...
Read more >CloudWatch does not aggregate across dimensions for your ...
If you also want a metric showing the data across all servers, each server would need to publish to a separate metric, which...
Read more >Quantify custom application metrics with Amazon CloudWatch ...
For example, if you would like to count the occurrences of logs, you can put “1” into the metric value field. Alternatively, a...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

My intention was to fix the error that wandb logging kills the ray worker in a distributed learning setting. So it’s not about logging too frequently actually, it’s about delayed logging after running. Please refer to the following code snippet.
Note that wandb log function ignores the metric if the step value is not ordered.
Hi I have the same issue. I want to avoid logging too frequently by pushing the aggregated metrics at the end of the running script. Previously I got advice from the wandb support team that I can use the following semantics:
wandb.log({ "step": step }, commit=True if step % freq == 0 else False)But it seems this semantics overwrites the previous metrics with the current metric at the moment it commits the log as it is reported in this issue . How can I sync the aggregated logs without overwriting the previous metrics?