Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Image input into TD3

See original GitHub issue

Hi,

I have a custom env with a image observation space and a continuous action space. After training TD3 policies, when I evaluate them there seems to be no reaction to the image observation (I manually drag objects in front of the camera to see what happens).

from stable_baselines.td3.policies import CnnPolicy as td3CnnPolicy
from stable_baselines import TD3

env = gym.make('GripperEnv-v0')
env = Monitor(env, log_dir)
ExperimentName = "TD3_test"
policy_kwargs = dict(layers=[64, 64])
model = TD3(td3CnnPolicy, env, verbose=1, policy_kwargs=policy_kwargs, tensorboard_log="tmp/", buffer_size=15000,
            batch_size=2200, train_freq=2200, learning_starts=10000, learning_rate=1e-3)

callback = SaveOnBestTrainingRewardCallback(check_freq=1100, log_dir=log_dir)
time_steps = 50000
model.learn(total_timesteps=int(time_steps), callback=callback)
model.save("128128/"+ExperimentName)

I can view the observation using opencv and it is the right image (single channel, pixels between 0 and 1).

So how I understand it is that the CNN is 3 conv2D layers that connect to two layers 64 wide. Is it possible that I somehow disconnected these two parts or could it be that my hyper-parameters are just that bad? The behavior that is learnt by the policies is similar to if I just put in zero pixels in the network.

Issue Analytics

State:
Created 3 years ago
Comments:20

Top GitHub Comments

2reactions

araffincommented, May 27, 2020

Being stuck executing one action could be a sign of too hard environment / bad learning result, but I do not have such an environment at hand to test this out. @araffin Do you have any experience with this?

SAC/TD3 are very slow with images, I recommend you to do something as here or here where you decouple policy learning from feature extraction.

This does not answer completely the question, but I don’t have much time for this right now.

1reaction

tkelestemurcommented, May 28, 2020

Aren’t we supposed to give image observations as values between 0-255? I am using 2 channel images as observation and map it to 0-255 from values between 0-1. Similar to @C-monC, I have depth images as the observations and I’m getting the same problem where the agent always chooses same action no matter the observations are. Btw, I’m using A2C.

Top Results From Across the Web

Behringer TD-3: Getting Started - Sweetwater

In this guide, we will show you how to set up, connect, and create music for the first time with a Behringer TD-3....

Behringer TD-3 Hands On and FL Studio Integration - YouTube

Hi, in this video I show you a little bit of the functionality of the Behringer TD-3. This video is just a hands-...

Custom Policy Network - Stable Baselines3 - Read the Docs

Stable Baselines3 provides policy networks for images (CnnPolicies), other type of input features (MlpPolicies) and multiple different inputs ...

rlTD3Agent - MathWorks

The twin-delayed deep deterministic policy gradient (DDPG) algorithm is an actor-critic, model-free, online, off-policy reinforcement learning method which ...

Behringer TD-3 question - Image Line forum - FL Studio

Recording: You need to connect the output of the TD-3 to your interface, and then select that input in the FL Studio mixer....