Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] PPO train pick and place task

See original GitHub issue

Question

Hi, at first, I used TQC+HER to trian FetchPickAndPlace-v1, and got a good result. Then, I considered adding image info into observation. But due to FetchPickAndPlace env is based on gym.GoalEnv and TQC+HER is based on HerReplayBuffer, I couldn’t seem to add image info and robot state info as observation at the same time. So I tried to use PPO to train FetchPickAndPlace-v1, however, after 5e6 timesteps, its reward doesn’t improve, so could ppo train pick and place task?

My train code

import gym

from stable_baselines3 import PPO

from stable_baselines3.common.env_util import make_vec_env

from stable_baselines3.common.callbacks import CheckpointCallback

env_id = 'FetchPickAndPlace-v1'

num_cpu = 4
vec_env = make_vec_env(env_id, n_envs=num_cpu)

log_dir = './tensorboard/' + env_id

checkpoint_callback = CheckpointCallback(save_freq=25000, save_path='model_checkpoints/'+env_id,
                                         name_prefix=env_id)

total_timesteps = 5000000

# PPO
model = PPO(policy="MultiInputPolicy", env=vec_env, verbose=1, normalize_advantage=True,
            tensorboard_log=log_dir)

model.learn(total_timesteps=total_timesteps, callback=checkpoint_callback)

model.save('./trained/'+env_id+'/'+env_id+model.__class__.__name__)

Checklist

I have read the documentation (required)
I have checked that there is no similar issue in the repo (required)

Issue Analytics

State:
Created a year ago
Comments:5

Top GitHub Comments

1reaction

qgallouedeccommented, Jun 17, 2022

ASAIK, on this env, the reward is way too sparse for PPO to converge.

TQC+HER converges mainly because of HER.

You should try with the dense reward setting. “FetchPickAndPlaceDense-v1” if I remember correctly.

0reactions

Rancho-zhaocommented, Jun 18, 2022

Okay, I will try, thank you for your reply！

Top Results From Across the Web

Reinforcement Learning for Pick and Place Operations ... - MDPI

Section 9 includes a brief discussion on open problems. The goal for completing a pick-and-place operation without task-specific program- ming ...

Reward Engineering for Object Pick and Place Training - arXiv

Reinforcement learning is the field of study where an agent learns a policy to execute an action by exploring and exploiting rewards from...

The 37 Implementation Details of Proximal Policy Optimization

Although being a different robotics simulator, Brax follows this idea and can train a viable agent in similar tasks with PPO using a...

Reinforcement Learning for Contact-Rich Tasks: Robotic Peg ...

Significance: This paper uses proximal policy optimization (PPO) to learn robotic peg insertion strategies, uses PyBullet library to construct a simulation ...

How to train your robot with deep reinforcement learning

In this section, we discuss one particular case study of scalable multi-task learning of vision-based manipulation skills, with a focus on tasks ......