Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Q] How to aggregate metrics on the client before logging?

See original GitHub issue

I currently do the following to try to reduce the amount of syncing to the server (I noticed calling wandb.log without doing this slowed down my training runs by 2x):

def commit()
    return random.randint(0, 100) == 0

# each time I call wandb.log
wandb.log({ "step": step }, commit=commit())

The idea is to commit the logs around once every hundred steps. But I don’t think that this is what is actually happening. How would I do what I’m trying to do?

Issue Analytics

State:
Created a year ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

th-yooncommented, Jul 29, 2022

My intention was to fix the error that wandb logging kills the ray worker in a distributed learning setting. So it’s not about logging too frequently actually, it’s about delayed logging after running. Please refer to the following code snippet.

import wandb
from collections import OrderedDict

aggregated_metrics = OrderedDict() # key is the time step, i.e., epoch or batch
for e in range(n_epochs):
    if e not in aggregated_metrics:
        aggregated_metrics[e] = {}
    loss = ~
    aggregated_metrics[e]['loss'] = loss
    # and so on
wandb.setup()
wandb.init(entity="your_entity", project="your_project") # and other arguments as needed
for t in aggregated_metrics:
    for k in aggregated_metrics[t]:
        wandb.log({k: aggregated_metrics[t][k]}, step=t)
wandb.finish()

Note that wandb log function ignores the metric if the step value is not ordered.

1reaction

th-yooncommented, Jul 27, 2022

Hi I have the same issue. I want to avoid logging too frequently by pushing the aggregated metrics at the end of the running script. Previously I got advice from the wandb support team that I can use the following semantics: wandb.log({ "step": step }, commit=True if step % freq == 0 else False) But it seems this semantics overwrites the previous metrics with the current metric at the moment it commits the log as it is reported in this issue . How can I sync the aggregated logs without overwriting the previous metrics?