[Question] PPO train pick and place task
See original GitHub issueQuestion
Hi, at first, I used TQC+HER to trian FetchPickAndPlace-v1
, and got a good result. Then, I considered adding image info into observation. But due to FetchPickAndPlace env is based on gym.GoalEnv and TQC+HER is based on HerReplayBuffer, I couldn’t seem to add image info and robot state info as observation at the same time. So I tried to use PPO to train FetchPickAndPlace-v1
, however, after 5e6 timesteps, its reward doesn’t improve, so could ppo train pick and place task?
My train code
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.callbacks import CheckpointCallback
env_id = 'FetchPickAndPlace-v1'
num_cpu = 4
vec_env = make_vec_env(env_id, n_envs=num_cpu)
log_dir = './tensorboard/' + env_id
checkpoint_callback = CheckpointCallback(save_freq=25000, save_path='model_checkpoints/'+env_id,
name_prefix=env_id)
total_timesteps = 5000000
# PPO
model = PPO(policy="MultiInputPolicy", env=vec_env, verbose=1, normalize_advantage=True,
tensorboard_log=log_dir)
model.learn(total_timesteps=total_timesteps, callback=checkpoint_callback)
model.save('./trained/'+env_id+'/'+env_id+model.__class__.__name__)
Checklist
- I have read the documentation (required)
- I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created a year ago
- Comments:5
Top Results From Across the Web
Reinforcement Learning for Pick and Place Operations ... - MDPI
Section 9 includes a brief discussion on open problems. The goal for completing a pick-and-place operation without task-specific program- ming ...
Read more >Reward Engineering for Object Pick and Place Training - arXiv
Reinforcement learning is the field of study where an agent learns a policy to execute an action by exploring and exploiting rewards from...
Read more >The 37 Implementation Details of Proximal Policy Optimization
Although being a different robotics simulator, Brax follows this idea and can train a viable agent in similar tasks with PPO using a...
Read more >Reinforcement Learning for Contact-Rich Tasks: Robotic Peg ...
Significance: This paper uses proximal policy optimization (PPO) to learn robotic peg insertion strategies, uses PyBullet library to construct a simulation ...
Read more >How to train your robot with deep reinforcement learning
In this section, we discuss one particular case study of scalable multi-task learning of vision-based manipulation skills, with a focus on tasks ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
ASAIK, on this env, the reward is way too sparse for PPO to converge.
TQC+HER converges mainly because of HER.
You should try with the dense reward setting. “FetchPickAndPlaceDense-v1” if I remember correctly.
Okay, I will try, thank you for your reply!