Image input into TD3
See original GitHub issueHi,
I have a custom env with a image observation space and a continuous action space. After training TD3 policies, when I evaluate them there seems to be no reaction to the image observation (I manually drag objects in front of the camera to see what happens).
from stable_baselines.td3.policies import CnnPolicy as td3CnnPolicy
from stable_baselines import TD3
env = gym.make('GripperEnv-v0')
env = Monitor(env, log_dir)
ExperimentName = "TD3_test"
policy_kwargs = dict(layers=[64, 64])
model = TD3(td3CnnPolicy, env, verbose=1, policy_kwargs=policy_kwargs, tensorboard_log="tmp/", buffer_size=15000,
batch_size=2200, train_freq=2200, learning_starts=10000, learning_rate=1e-3)
callback = SaveOnBestTrainingRewardCallback(check_freq=1100, log_dir=log_dir)
time_steps = 50000
model.learn(total_timesteps=int(time_steps), callback=callback)
model.save("128128/"+ExperimentName)
I can view the observation using opencv and it is the right image (single channel, pixels between 0 and 1).
So how I understand it is that the CNN is 3 conv2D layers that connect to two layers 64 wide. Is it possible that I somehow disconnected these two parts or could it be that my hyper-parameters are just that bad? The behavior that is learnt by the policies is similar to if I just put in zero pixels in the network.
Issue Analytics
- State:
- Created 3 years ago
- Comments:20
Top Results From Across the Web
Behringer TD-3: Getting Started - Sweetwater
In this guide, we will show you how to set up, connect, and create music for the first time with a Behringer TD-3....
Read more >Behringer TD-3 Hands On and FL Studio Integration - YouTube
Hi, in this video I show you a little bit of the functionality of the Behringer TD-3. This video is just a hands-...
Read more >Custom Policy Network - Stable Baselines3 - Read the Docs
Stable Baselines3 provides policy networks for images (CnnPolicies), other type of input features (MlpPolicies) and multiple different inputs ...
Read more >rlTD3Agent - MathWorks
The twin-delayed deep deterministic policy gradient (DDPG) algorithm is an actor-critic, model-free, online, off-policy reinforcement learning method which ...
Read more >Behringer TD-3 question - Image Line forum - FL Studio
Recording: You need to connect the output of the TD-3 to your interface, and then select that input in the FL Studio mixer....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
SAC/TD3 are very slow with images, I recommend you to do something as here or here where you decouple policy learning from feature extraction.
This does not answer completely the question, but I don’t have much time for this right now.
Aren’t we supposed to give image observations as values between 0-255? I am using 2 channel images as observation and map it to 0-255 from values between 0-1. Similar to @C-monC, I have depth images as the observations and I’m getting the same problem where the agent always chooses same action no matter the observations are. Btw, I’m using A2C.