Reproduce the results of the DDPPO paper
See original GitHub issue❓ Questions and Help
Hi! First of all thank you for the amazing project you’re carrying out!
I’m trying to reproduce the results obtained in the ddppo paper. I just installed the latest versions of habitat-sim and habitat-api, downloaded the pre-trained models, downloaded the “Gibson dataset for Habitat” (Gibson_dataset_trainval
) and the corresponding Gibson task dataset from here (pointnav_gibson_v1.zip
file).
Then I slighly modified the habitat_baselines/config/pointnav/ddppo_pointnav.yaml
config file, to use the correct sensor (RGB or Depth) and load the correct pretrained checkpoint. If I run the habitat_baselines/rl/ddppo/single_node.sh
with the eval
flag the process freeze, I don’t know the exact reason, maybe the procedure expects a checkpoint including the config parameters, that it is not found. For this reason I launched the training process for few seconds, in order for the ckpt0
to be created, then I launched the eval process again (994 eval episodes).
The model based on depth images returns the correct performances (SPL ~0.95) but unfortunately the one based on RGB images doesn’t, and it reports an SPL/SR of about 0.35/0.50 using the gibson-2plus-mp3d-train-val-test-se-resneXt50-rgb.pth
checkpoint.
Is there something I’m missing? Here there is the config file I’m using:
BASE_TASK_CONFIG_PATH: "configs/tasks/pointnav_gibson.yaml"
TRAINER_NAME: "ddppo"
ENV_NAME: "NavRLEnv"
SIMULATOR_GPU_ID: 0
TORCH_GPU_ID: 0
VIDEO_OPTION: []
TENSORBOARD_DIR: "tb"
VIDEO_DIR: "video_dir"
TEST_EPISODE_COUNT: -1
EVAL_CKPT_PATH_DIR: "data/new_checkpoints"
NUM_PROCESSES: 4
SENSORS: ["RGB_SENSOR"]
CHECKPOINT_FOLDER: "data/new_checkpoints"
NUM_UPDATES: 10000
LOG_INTERVAL: 10
CHECKPOINT_INTERVAL: 250
RL:
SUCCESS_REWARD: 2.5
POLICY:
name: "PointNavResNetPolicy"
PPO:
# ppo params
clip_param: 0.2
ppo_epoch: 2
num_mini_batch: 2
value_loss_coef: 0.5
entropy_coef: 0.01
lr: 2.5e-4
eps: 1e-5
max_grad_norm: 0.2
num_steps: 128
use_gae: True
gamma: 0.99
tau: 0.95
use_linear_clip_decay: False
use_linear_lr_decay: False
reward_window_size: 50
use_normalized_advantage: False
hidden_size: 512
DDPPO:
sync_frac: 0.6
# The PyTorch distributed backend to use
distrib_backend: GLOO
# Visual encoder backbone
pretrained_weights: data/ddppo-models/gibson-2plus-mp3d-train-val-test-se-resneXt50-rgb.pth #data/ddppo-models/gibson-2plus-resnet50.pth
# Initialize with pretrained weights
pretrained: True
# Initialize just the visual encoder backbone with pretrained weights
pretrained_encoder: False
# Whether or not the visual encoder backbone will be trained.
train_encoder: False
# Whether or not to reset the critic linear layer
reset_critic: False
# Model parameters
backbone: se_resneXt50 #resnet50
rnn_type: LSTM
num_recurrent_layers: 2
Here the results for the Depth model:
Average episode reward: 7.6915 Average episode distance_to_goal: 0.0944 Average episode success: 0.9960 Average episode spl: 0.9514
and the results for the RGB model:
Average episode reward: 0.3978 Average episode distance_to_goal: 3.5889 Average episode success: 0.5000 Average episode spl: 0.3489
Thank you!
Issue Analytics
- State:
- Created 3 years ago
- Comments:7
Top GitHub Comments
Yep
Thank you @erikwijmans for your support! Just a quick question about the running mean and var task: can you give more details about what it is supposed to do? Does it compute mean and variance of input images all over the trainset, so that they are saved in the checkpoint file and loaded when I decide to resume the training? Thank you!