question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Tune] Logging with multiple time intervals

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • Ray installed from (source or binary): Pip
  • Ray version: 0.6.3
  • Python version: 3.6.5

Describe the problem

The Trainable interface in tune expects the step method to output a logging dictionary. However, it is unclear how to annotate logging statements with different global steps. For example, one way want to record the model’s gradients at every training iteration, but may only want to record the dev metric once per epoch.

One solution that we are exploring is building an adapter Trainable interface, with a custom logger (default logger being deactivated), that would be passed to the step method. The logger would then have a method such as: log(key, value, time_step). The custom logger, when given a result (i.e on_result) would then parse the list of (key, value, time_step) tuples and output the correct tensorboard graphs. The only downside from this method is that the user would have to wait until the step is finished before seeing the logs for that step.

I was wondering if you had faced this question before, and had some thoughts about the best way to approach it. Thank you in advance for your help!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
richardliawcommented, Feb 27, 2019

@jeremyasapp - would something like this work?

def _train():
    batch = next(train_sampler)
    loss = train(batch)
    extras = {}
    if self._iteration % n_iter_per_step == 0:
        extras["metric"] = evaluate(self.dev_data)
    return dict(loss=loss, **extras)

Here, you’d get loss having the right training step, and extras would be reported every multiple of n_iter_per_step - I think this should work, but let me know if otherwise.

1reaction
jeremyasappcommented, Feb 25, 2019

Hi, thanks for your quick response!

So I think I understand what you mean but these lines in the TFLogger gave me the impression that all the metrics are given the same step:

train_stats = tf.Summary(value=values)
t = result.get(TIMESTEPS_TOTAL) or result[TRAINING_ITERATION]
self._file_writer.add_summary(train_stats, t)

What I’m confused about is that we only get to return a single dictionary of values every step, but within a step, it’s possible that you may generate multiple values for the same key. Take for example:

def step():
    losses = []
    # Do n training steps
    for i in (n_iter_per_step):
        batch = next(train_sampler)
        loss = train(batch)
        losses.append(loss)

    # Run evaluation
    metric = evaluate(self.dev_data)
    return dict(metric=metric, loss=losses)

In this example, it’s unclear to me how Tensorboard will go about parsing the list of values. Ideally I would call tf_summary with the correct training time step for each value in losses (i.e n_iter_per_step * global_step + i). Please let me know if that makes things more clear, or I can give a more concrete example. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

The logging.handlers: How to rollover after time or maxBytes?
You can see that first four logs were rolled over after reaching size of 1MB, while the last rollover occurred after two minutes....
Read more >
[Time Management #2] Time Logging: Log Where Your Time ...
A time log form for a work day (11 hours) in 15-minute intervals. You need not necessarily stop every 10th or 15th minute...
Read more >
Logging - Advanced Python 10
The logging module in Python is a powerful built-in module so you can quickly add ... There are 5 different log levels indicating...
Read more >
A Guide to Rolling File Appenders - Baeldung
A quick and practical guide to using rolling file appenders in popular Java logging libraries.
Read more >
Create Log Interval Configuration - TechDocs - Broadcom Inc.
Interval Logging allows you to collect resource usage data from the CA SDM servers. The collected interval log is used to analyze and ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found