Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[tune] tf.summary.FileWriter extensibility for custom TensorBoard metrics

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
Ray installed from (source or binary): source
Ray version: 0.6.6
Python version: 3.6.7
Exact command to reproduce: NA

Context: I rely on tune and tensorboard for visualizing training while using callbacks to define custom metrics in the dictionary results then passed to TFLogger.

Problem: ray saves scalars only, and all of them are saved under the same tab ‘ray’ in tensorboard. Having tens of metrics under the same tab does not help readability, in particular if the end user is adding custom metrics. It would be a great feature to let users access TFLogger._file_writer so that they can add custom metrics (not just scalars) in custom tabs. Note that creating a second tf.summary.FileWriter is not an option as two FileWriter sharing the same logdir are not supported at this time. Question: what’s the recommended way to achieve that?

Attempts: using a custom Logger instance is not an option as the trainer is never passed to (only results) and this limits the access to possible custom metrics of interest. Using the callback on_train_result does pass the ‘trainer’ (info[‘trainer’]) but from there I don’t see how it possible to access TFLogger._file_writer to save custom metrics in tensorboard.

Issue Analytics

State:
Created 4 years ago
Comments:22 (15 by maintainers)

Top GitHub Comments

2reactions

FedericoFontanacommented, May 14, 2019

I’m happy to give it a try when you give me the ok.

1reaction

OnTheRickycommented, Apr 30, 2020

@richardliaw I did as you recommended (no TFLogger, FileWriter instanced in Trainer._init) and it works like a charm. Note that I’ve provided the computation graph when initializing FileWriter. I think that rllib/tune should save the computation graph by default as it is invaluable both for developers (debugging) and users (understand/visualize policy network without going through any source code).

What should the next step be (e.g. if PR, what should the PR change)?
class PPO(PPOTrainer):
    def _init(self, config, env_creator):
        super()._init(config, env_creator)
        self._file_writer = tf.summary.FileWriter(
            logdir=self.logdir,
            graph=self.get_policy().sess.graph,
        )
        self._file_writer.flush()

Has this been implemented already? If so, what changes are required to see the graph in tensorboard?

Top Results From Across the Web

TensorBoard Scalars: Logging training metrics in Keras

Retrain the regression model and log a custom learning rate. Here's how: Create a file writer, using tf.summary.create_file_writer() . Define a ...

Inside TensorFlow: Summaries and TensorBoard - YouTube

Learn how TensorBoard and the tf. summary API work together to visualize your data, including details about API changes, log directories, ...

Converting a TensorFlow XLNet Model

1 under Python2. from collections import namedtuple import tensorflow as tf ...

How to add custom summaries to tensorboard when training ...

TensorFlow callback TensorBoardWithTime defined below logs cumulative training and evaluation batch time.

How to Log MXNet Data for Visualization in TensorBoard

We adapted the following low-level logging components from their Python and C++ implementations in TensorFlow: FileWriter , EventFileWriter , EventsWriter , ...