question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] PPO train pick and place task

See original GitHub issue

Question

Hi, at first, I used TQC+HER to trian FetchPickAndPlace-v1, and got a good result. Then, I considered adding image info into observation. But due to FetchPickAndPlace env is based on gym.GoalEnv and TQC+HER is based on HerReplayBuffer, I couldn’t seem to add image info and robot state info as observation at the same time. So I tried to use PPO to train FetchPickAndPlace-v1, however, after 5e6 timesteps, its reward doesn’t improve, so could ppo train pick and place task?

My train code

import gym

from stable_baselines3 import PPO

from stable_baselines3.common.env_util import make_vec_env

from stable_baselines3.common.callbacks import CheckpointCallback

env_id = 'FetchPickAndPlace-v1'

num_cpu = 4
vec_env = make_vec_env(env_id, n_envs=num_cpu)

log_dir = './tensorboard/' + env_id

checkpoint_callback = CheckpointCallback(save_freq=25000, save_path='model_checkpoints/'+env_id,
                                         name_prefix=env_id)

total_timesteps = 5000000

# PPO
model = PPO(policy="MultiInputPolicy", env=vec_env, verbose=1, normalize_advantage=True,
            tensorboard_log=log_dir)

model.learn(total_timesteps=total_timesteps, callback=checkpoint_callback)

model.save('./trained/'+env_id+'/'+env_id+model.__class__.__name__)

Checklist

  • I have read the documentation (required)
  • I have checked that there is no similar issue in the repo (required)

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
qgallouedeccommented, Jun 17, 2022

ASAIK, on this env, the reward is way too sparse for PPO to converge.

TQC+HER converges mainly because of HER.

You should try with the dense reward setting. “FetchPickAndPlaceDense-v1” if I remember correctly.

0reactions
Rancho-zhaocommented, Jun 18, 2022

Okay, I will try, thank you for your reply!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reinforcement Learning for Pick and Place Operations ... - MDPI
Section 9 includes a brief discussion on open problems. The goal for completing a pick-and-place operation without task-specific program- ming ...
Read more >
Reward Engineering for Object Pick and Place Training - arXiv
Reinforcement learning is the field of study where an agent learns a policy to execute an action by exploring and exploiting rewards from...
Read more >
The 37 Implementation Details of Proximal Policy Optimization
Although being a different robotics simulator, Brax follows this idea and can train a viable agent in similar tasks with PPO using a...
Read more >
Reinforcement Learning for Contact-Rich Tasks: Robotic Peg ...
Significance: This paper uses proximal policy optimization (PPO) to learn robotic peg insertion strategies, uses PyBullet library to construct a simulation ...
Read more >
How to train your robot with deep reinforcement learning
In this section, we discuss one particular case study of scalable multi-task learning of vision-based manipulation skills, with a focus on tasks ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found