Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproduce the results of the DDPPO paper

See original GitHub issue

❓ Questions and Help

Hi! First of all thank you for the amazing project you’re carrying out!

I’m trying to reproduce the results obtained in the ddppo paper. I just installed the latest versions of habitat-sim and habitat-api, downloaded the pre-trained models, downloaded the “Gibson dataset for Habitat” (Gibson_dataset_trainval) and the corresponding Gibson task dataset from here (pointnav_gibson_v1.zip file). Then I slighly modified the habitat_baselines/config/pointnav/ddppo_pointnav.yaml config file, to use the correct sensor (RGB or Depth) and load the correct pretrained checkpoint. If I run the habitat_baselines/rl/ddppo/single_node.sh with the eval flag the process freeze, I don’t know the exact reason, maybe the procedure expects a checkpoint including the config parameters, that it is not found. For this reason I launched the training process for few seconds, in order for the ckpt0 to be created, then I launched the eval process again (994 eval episodes). The model based on depth images returns the correct performances (SPL ~0.95) but unfortunately the one based on RGB images doesn’t, and it reports an SPL/SR of about 0.35/0.50 using the gibson-2plus-mp3d-train-val-test-se-resneXt50-rgb.pth checkpoint. Is there something I’m missing? Here there is the config file I’m using:

BASE_TASK_CONFIG_PATH: "configs/tasks/pointnav_gibson.yaml"
TRAINER_NAME: "ddppo"
ENV_NAME: "NavRLEnv"
SIMULATOR_GPU_ID: 0
TORCH_GPU_ID: 0
VIDEO_OPTION: []
TENSORBOARD_DIR: "tb"
VIDEO_DIR: "video_dir"
TEST_EPISODE_COUNT: -1
EVAL_CKPT_PATH_DIR: "data/new_checkpoints"
NUM_PROCESSES: 4
SENSORS: ["RGB_SENSOR"]
CHECKPOINT_FOLDER: "data/new_checkpoints"
NUM_UPDATES: 10000
LOG_INTERVAL: 10
CHECKPOINT_INTERVAL: 250

RL:
  SUCCESS_REWARD: 2.5

  POLICY:
    name: "PointNavResNetPolicy"

  PPO:
    # ppo params
    clip_param: 0.2
    ppo_epoch: 2
    num_mini_batch: 2
    value_loss_coef: 0.5
    entropy_coef: 0.01
    lr: 2.5e-4
    eps: 1e-5
    max_grad_norm: 0.2
    num_steps: 128
    use_gae: True
    gamma: 0.99
    tau: 0.95
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    reward_window_size: 50

    use_normalized_advantage: False

    hidden_size: 512

  DDPPO:
    sync_frac: 0.6
    # The PyTorch distributed backend to use
    distrib_backend: GLOO
    # Visual encoder backbone
    pretrained_weights: data/ddppo-models/gibson-2plus-mp3d-train-val-test-se-resneXt50-rgb.pth #data/ddppo-models/gibson-2plus-resnet50.pth
    # Initialize with pretrained weights
    pretrained: True
    # Initialize just the visual encoder backbone with pretrained weights
    pretrained_encoder: False
    # Whether or not the visual encoder backbone will be trained.
    train_encoder: False
    # Whether or not to reset the critic linear layer
    reset_critic: False

    # Model parameters
    backbone: se_resneXt50 #resnet50
    rnn_type: LSTM
    num_recurrent_layers: 2

Here the results for the Depth model:

Average episode reward: 7.6915
Average episode distance_to_goal: 0.0944
Average episode success: 0.9960
Average episode spl: 0.9514

and the results for the RGB model:

Average episode reward: 0.3978                                               
Average episode distance_to_goal: 3.5889                                     
Average episode success: 0.5000                                              
Average episode spl: 0.3489

Thank you!

Issue Analytics

State:
Created 3 years ago
Comments:7

Top GitHub Comments

1reaction

erikwijmanscommented, Jan 27, 2021

Yep

0reactions

rosanomcommented, Jan 27, 2021

Thank you @erikwijmans for your support! Just a quick question about the running mean and var task: can you give more details about what it is supposed to do? Does it compute mean and variance of input images all over the trainset, so that they are saved in the checkpoint file and loaded when I decide to resume the training? Thank you!

Top Results From Across the Web

DD-PPO: Learning Near-Perfect PointGoal Navigators ... - arXiv

Abstract: We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in ...

How to Reproduce Papers : r/learnmachinelearning - Reddit

I'm wondering what the best ways are to reproduce results in a paper. And the best papers to start with in order to...

DD-PPO: LEARNING NEAR-PERFECT POINTGOAL

Published as a conference paper at ICLR 2020 ... 2 for results of the best DD-PPO agent for Blind, RGB, and RGB-D and...

ObjectNav performance · Issue #9 · allenai/embodied-clip · GitHub

Hi Apoorv, I have rerun the training of Habitat-ObjNav with the default configurations, trying to reproduce your results. I found that for the...

Algorithms — Ray 2.2.0 - the Ray documentation

This method should be viewed as for research purposes, and for reproducing the results of the paper introducing it. MADDPG-specific configs (see also...