Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Episode mean reward is not properly logged on tensorboard when using SAC

See original GitHub issue

I’m training an agent on a custom environment using SAC. The environment is wrapped in a Monitor, which is wrapped in a DummyVecEnv, which is wrapped in a VecNormalize, with norm_reward = True.

This is the tensorboard graph for the episode mean reward:

No smoothing	0.9 Smoothing

As you can see, the graph has some weird loops. For example, at around 170k steps or 450k steps.

Edit: Training is conducted in epochs of 50k steps.

Program starts by calling ./start.sh.

start.sh

#!/bin/bash

while [ "$?" -eq 0 ]; do
	python3 main.py
done

main.py

import os.path

from my_custom_env import MyCustomEnv

from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import BaseCallback
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

class SaveCheckpoint(BaseCallback):
	def __init__(self, save_freq, verbose = 0):
		super(SaveCheckpoint, self).__init__(verbose)
		self.save_freq = save_freq

	def _on_step(self):
		if self.num_timesteps % self.save_freq == 0:
			self.model.save("model.zip")
			self.training_env.save("stats.pkl")

		return True


if __name__ == '__main__':

	# inits
	env = DummyVecEnv([lambda: Monitor(MyCustomEnv())])
	model = None

	# load recent checkpoint
	if os.path.isfile("model.zip") and os.path.isfile("stats.pkl"):
		env = VecNormalize.load("stats.pkl", env)
		env.reset()
		model = SAC.load("model.zip", env)
	else:
		env = VecNormalize(env)
		model = SAC('MlpPolicy', env, verbose = 1, tensorboard_log = ".")

	# replay buffer
	if os.path.isfile("replay_buffer.pkl"):
		model.load_replay_buffer("replay_buffer.pkl")

	# train
	model.learn(50000,
		callback = SaveCheckpoint(10000),
		log_interval = 1,
		reset_num_timesteps = False
	)

	# save replay buffer
	model.save_replay_buffer(".")

	env.close()

> pip3 freeze | grep 'stable-baselines3'
stable-baselines3==0.7.0a1

Issue Analytics

State:
Created 3 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

siferaticommented, Jun 17, 2020

Updated the post with more detailed info

0reactions

siferaticommented, Jun 18, 2020

Yup, that seems to have been the problem. Thanks!

Top Results From Across the Web

Tensorboard Integration - Stable Baselines3 - Read the Docs

It will display information such as the episode reward (when using a Monitor wrapper), the model losses and other parameter unique to some...

Understanding the tensorboard plots on a stable-baseline3's ...

ep_rew_mean : Mean episodic training reward (averaged over 100 episodes), a Monitor wrapper is required to compute that value (automatically ...

Resolved - Tensorboard only provides text data, no scalar data

I've tried Release versions: 2, 4, and 10 along with their corresponding python versions. I'm running mlagents-learn without problem.

Everything You Need To Master Actor Critic Methods - YouTube

We'll cover the Markov decision process, the agent's policy, reward discounting and why it's necessary, and the actor critic algorithm.

Environments | TensorFlow Agents

Using Standard Environments; Creating your own Python Environment ... reward : The agent is learning to maximize the sum of these rewards ......

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Episode mean reward is not properly logged on tensorboard when using SAC

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[Enhancement] Improve Typing Support / Ensure every function has return type

Environment is reset twice per episode when evaluating policy on DummyVecEnv