question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PPO model training with habitat 2020 challenge config

See original GitHub issue

@mathfac @dhruvbatra Hi! Another issue on my side I am struggling with is the right training of habitat baseline ppo model with 2020 challenge configuration for pointnav task using habitat-api.

As PPO agent configuration I use the following file ppo_pointnav.yaml

BASE_TASK_CONFIG_PATH: "configs/tasks/pointnav_gib_rgbd_2020.yaml"
TRAINER_NAME: "ppo"
ENV_NAME: "NavRLEnv"
SIMULATOR_GPU_ID: 1
TORCH_GPU_ID: 1

VIDEO_OPTION: ["disk", "tensorboard"]
TENSORBOARD_DIR: "tb"
VIDEO_DIR: "video_dir"
TEST_EPISODE_COUNT: 994
EVAL_CKPT_PATH_DIR: "data/ppo_2020_checkpoints"

NUM_PROCESSES: 4

SENSORS: ["DEPTH_SENSOR"]
CHECKPOINT_FOLDER: "data/ppo_2020_checkpoints"
NUM_UPDATES: 270000
LOG_INTERVAL: 25
CHECKPOINT_INTERVAL: 2000

RL:
  PPO:
    clip_param: 0.1
    ppo_epoch: 4
    num_mini_batch: 2
    value_loss_coef: 0.5
    entropy_coef: 0.01
    lr: 2.5e-4
    eps: 1e-5
    max_grad_norm: 0.5
    num_steps: 128
    hidden_size: 512
    use_gae: True
    gamma: 0.99
    tau: 0.95
    use_linear_clip_decay: True
    use_linear_lr_decay: True
    reward_window_size: 50

And for task configuration I used the same parameters as in challenge_pointnav2020.local.rgbd.yaml file:

ENVIRONMENT:
  MAX_EPISODE_STEPS: 500
SIMULATOR:
  AGENT_0:
    SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
    HEIGHT: 0.88
    RADIUS: 0.18
  HABITAT_SIM_V0:
    GPU_DEVICE_ID: 0
    ALLOW_SLIDING: False
  RGB_SENSOR:
    WIDTH: 640
    HEIGHT: 360
    HFOV: 70
    POSITION: [0, 0.88, 0]
    NOISE_MODEL: "GaussianNoiseModel"
    NOISE_MODEL_KWARGS:
      intensity_constant: 0.1

  DEPTH_SENSOR:
    WIDTH: 640
    HEIGHT: 360
    HFOV: 70
    MIN_DEPTH: 0.1
    MAX_DEPTH: 10.0
    POSITION: [0, 0.88, 0]
    NOISE_MODEL: "RedwoodDepthNoiseModel"

  ACTION_SPACE_CONFIG: 'pyrobotnoisy'
  NOISE_MODEL:
    ROBOT: "LoCoBot"
    CONTROLLER: 'Proportional'
    NOISE_MULTIPLIER: 0.5

TASK:
  TYPE: Nav-v0
  SUCCESS_DISTANCE: 0.36
  SENSORS: ['POINTGOAL_SENSOR']
  POINTGOAL_SENSOR:
    GOAL_FORMAT: POLAR
    DIMENSIONALITY: 2
  GOAL_SENSOR_UUID: pointgoal
  MEASUREMENTS: ['DISTANCE_TO_GOAL', "SUCCESS", 'SPL']
  SUCCESS:
    SUCCESS_DISTANCE: 0.36

Just changed path to the train dataset (Habitat Challenge Data for Gibson (1.5 GB)):

DATASET:
  TYPE: PointNav-v1
  SPLIT: train
  DATA_PATH: data/datasets/pointnav/gibson/v1/{split}/{split}.json.gz

After runnig command python -u habitat_baselines/run.py --exp-config habitat_baselines/config/pointnav/ppo_pointnav.yaml --run-type I got the following error:

---
 The active scene does not contain semantic annotations. 
---
I0325 20:17:08.559100 8915 simulator.py:143] Loaded navmesh data/scene_datasets/gibson/Monson.navmesh
I0325 20:17:08.559392 8915 simulator.py:155] Recomputing navmesh for agent's height 0.88 and radius 0.18.
I0325 20:17:08.567361  8915 PathFinder.cpp:338] Building navmesh with 275x112 cells
I0325 20:17:08.655342  8915 PathFinder.cpp:606] Created navmesh with 137 vertices 61 polygons
I0325 20:17:08.655371  8915 Simulator.cpp:403] reconstruct navmesh successful
2020-03-25 20:17:08,720 Initializing task Nav-v0
2020-03-25 20:17:11,725 agent number of parameters: 52694149
/home/pryhoda/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/optim/lr_scheduler.py:122: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [3,0,0], thread: [0,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [3,0,0], thread: [1,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [3,0,0], thread: [2,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [3,0,0], thread: [3,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [0,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [1,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [2,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [0,0,0], thread: [3,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [2,0,0], thread: [0,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [2,0,0], thread: [1,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [2,0,0], thread: [2,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [2,0,0], thread: [3,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [1,0,0], thread: [0,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [1,0,0], thread: [1,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [1,0,0], thread: [2,0,0] Assertion `val >= zero` failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:243: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [1,0,0], thread: [3,0,0] Assertion `val >= zero` failed.
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 70, in <module>
    main()
  File "habitat_baselines/run.py", line 40, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 64, in run_exp
    trainer.train()
  File "/home/pryhoda/HabitatProject/habitat-api/habitat_baselines/rl/ppo/ppo_trainer.py", line 346, in train
    rollouts, current_episode_reward, running_episode_stats
  File "/home/pryhoda/HabitatProject/habitat-api/habitat_baselines/rl/ppo/ppo_trainer.py", line 181, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/home/pryhoda/HabitatProject/habitat-api/habitat_baselines/rl/ppo/ppo_trainer.py", line 181, in <listcomp>
    outputs = self.envs.step([a[0].item() for a in actions])
RuntimeError: CUDA error: device-side assert triggered
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7f8dfa79ea58>>
Traceback (most recent call last):
  File "/home/pryhoda/HabitatProject/habitat-api/habitat/core/vector_env.py", line 468, in __del__
    self.close()
  File "/home/pryhoda/HabitatProject/habitat-api/habitat/core/vector_env.py", line 350, in close
    write_fn((CLOSE_COMMAND, None))
  File "/home/pryhoda/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/pryhoda/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/pryhoda/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

I am wondering if PPO agent is not adapted to train with 2020 challenge config (it run ok for me with 2019 challenge config - pointnav_gibson_rgbd.yaml ) or it is some issues on my side ? Thanks in advance!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:20 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
ghostcommented, May 9, 2020

@erikwijmans As you suggested I trained DD-PPO model with resnet18 backbone.

When I tried to evaluate it, I got the following error:

Traceback (most recent call last):
  File "agent.py", line 165, in <module>
    main()
  File "agent.py", line 155, in main
    agent = DDPPOAgent(config)
  File "agent.py", line 88, in __init__
    for k, v in ckpt["state_dict"].items()
  File "/opt/conda/envs/habitat/lib/python3.6/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy:
	Missing key(s) in state_dict: "net.visual_encoder.backbone.layer1.0.convs.6.weight", "net.visual_encoder.backbone.layer1.0.convs.7.weight", "net.visual_encoder.backbone.layer1.0.convs.7.bias", "net.visual_encoder.backbone.layer1.0.downsample.0.weight", "net.visual_encoder.backbone.layer1.0.downsample.1.weight", "net.visual_encoder.backbone.layer1.0.downsample.1.bias", "net.visual_encoder.backbone.layer1.1.convs.6.weight", "net.visual_encoder.backbone.layer1.1.convs.7.weight", "net.visual_encoder.backbone.layer1.1.convs.7.bias", "net.visual_encoder.backbone.layer1.2.convs.0.weight", "net.visual_encoder.backbone.layer1.2.convs.1.weight", "net.visual_encoder.backbone.layer1.2.convs.1.bias", "net.visual_encoder.backbone.layer1.2.convs.3.weight", "net.visual_encoder.backbone.layer1.2.convs.4.weight", "net.visual_encoder.backbone.layer1.2.convs.4.bias", "net.visual_encoder.backbone.layer1.2.convs.6.weight", "net.visual_encoder.backbone.layer1.2.convs.7.weight", "net.visual_encoder.backbone.layer1.2.convs.7.bias", "net.visual_encoder.backbone.layer2.0.convs.6.weight", "net.visual_encoder.backbone.layer2.0.convs.7.weight", "net.visual_encoder.backbone.layer2.0.convs.7.bias", "net.visual_encoder.backbone.layer2.1.convs.6.weight", "net.visual_encoder.backbone.layer2.1.convs.7.weight", "net.visual_encoder.backbone.layer2.1.convs.7.bias", "net.visual_encoder.backbone.layer2.2.convs.0.weight", "net.visual_encoder.backbone.layer2.2.convs.1.weight", "net.visual_encoder.backbone.layer2.2.convs.1.bias", "net.visual_encoder.backbone.layer2.2.convs.3.weight", "net.visual_encoder.backbone.layer2.2.convs.4.weight", "net.visual_encoder.backbone.layer2.2.convs.4.bias", "net.visual_encoder.backbone.layer2.2.convs.6.weight", "net.visual_encoder.backbone.layer2.2.convs.7.weight", "net.visual_encoder.backbone.layer2.2.convs.7.bias", "net.visual_encoder.backbone.layer2.3.convs.0.weight", "net.visual_encoder.backbone.layer2.3.convs.1.weight", "net.visual_encoder.backbone.layer2.3.convs.1.bias", "net.visual_encoder.backbone.layer2.3.convs.3.weight", "net.visual_encoder.backbone.layer2.3.convs.4.weight", "net.visual_encoder.backbone.layer2.3.convs.4.bias", "net.visual_encoder.backbone.layer2.3.convs.6.weight", "net.visual_encoder.backbone.layer2.3.convs.7.weight", "net.visual_encoder.backbone.layer2.3.convs.7.bias", "net.visual_encoder.backbone.layer3.0.convs.6.weight", "net.visual_encoder.backbone.layer3.0.convs.7.weight", "net.visual_encoder.backbone.layer3.0.convs.7.bias", "net.visual_encoder.backbone.layer3.1.convs.6.weight", "net.visual_encoder.backbone.layer3.1.convs.7.weight", "net.visual_encoder.backbone.layer3.1.convs.7.bias", "net.visual_encoder.backbone.layer3.2.convs.0.weight", "net.visual_encoder.backbone.layer3.2.convs.1.weight", "net.visual_encoder.backbone.layer3.2.convs.1.bias", "net.visual_encoder.backbone.layer3.2.convs.3.weight", "net.visual_encoder.backbone.layer3.2.convs.4.weight", "net.visual_encoder.backbone.layer3.2.convs.4.bias", "net.visual_encoder.backbone.layer3.2.convs.6.weight", "net.visual_encoder.backbone.layer3.2.convs.7.weight", "net.visual_encoder.backbone.layer3.2.convs.7.bias", "net.visual_encoder.backbone.layer3.3.convs.0.weight", "net.visual_encoder.backbone.layer3.3.convs.1.weight", "net.visual_encoder.backbone.layer3.3.convs.1.bias", "net.visual_encoder.backbone.layer3.3.convs.3.weight", "net.visual_encoder.backbone.layer3.3.convs.4.weight", "net.visual_encoder.backbone.layer3.3.convs.4.bias", "net.visual_encoder.backbone.layer3.3.convs.6.weight", "net.visual_encoder.backbone.layer3.3.convs.7.weight", "net.visual_encoder.backbone.layer3.3.convs.7.bias", "net.visual_encoder.backbone.layer3.4.convs.0.weight", "net.visual_encoder.backbone.layer3.4.convs.1.weight", "net.visual_encoder.backbone.layer3.4.convs.1.bias", "net.visual_encoder.backbone.layer3.4.convs.3.weight", "net.visual_encoder.backbone.layer3.4.convs.4.weight", "net.visual_encoder.backbone.layer3.4.convs.4.bias", "net.visual_encoder.backbone.layer3.4.convs.6.weight", "net.visual_encoder.backbone.layer3.4.convs.7.weight", "net.visual_encoder.backbone.layer3.4.convs.7.bias", "net.visual_encoder.backbone.layer3.5.convs.0.weight", "net.visual_encoder.backbone.layer3.5.convs.1.weight", "net.visual_encoder.backbone.layer3.5.convs.1.bias", "net.visual_encoder.backbone.layer3.5.convs.3.weight", "net.visual_encoder.backbone.layer3.5.convs.4.weight", "net.visual_encoder.backbone.layer3.5.convs.4.bias", "net.visual_encoder.backbone.layer3.5.convs.6.weight", "net.visual_encoder.backbone.layer3.5.convs.7.weight", "net.visual_encoder.backbone.layer3.5.convs.7.bias", "net.visual_encoder.backbone.layer4.0.convs.6.weight", "net.visual_encoder.backbone.layer4.0.convs.7.weight", "net.visual_encoder.backbone.layer4.0.convs.7.bias", "net.visual_encoder.backbone.layer4.1.convs.6.weight", "net.visual_encoder.backbone.layer4.1.convs.7.weight", "net.visual_encoder.backbone.layer4.1.convs.7.bias", "net.visual_encoder.backbone.layer4.2.convs.0.weight", "net.visual_encoder.backbone.layer4.2.convs.1.weight", "net.visual_encoder.backbone.layer4.2.convs.1.bias", "net.visual_encoder.backbone.layer4.2.convs.3.weight", "net.visual_encoder.backbone.layer4.2.convs.4.weight", "net.visual_encoder.backbone.layer4.2.convs.4.bias", "net.visual_encoder.backbone.layer4.2.convs.6.weight", "net.visual_encoder.backbone.layer4.2.convs.7.weight", "net.visual_encoder.backbone.layer4.2.convs.7.bias". 
	size mismatch for net.visual_encoder.backbone.layer1.0.convs.0.weight: copying a param with shape torch.Size([32, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer1.1.convs.0.weight: copying a param with shape torch.Size([32, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 128, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer2.0.convs.0.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer2.0.downsample.0.weight: copying a param with shape torch.Size([64, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer2.0.downsample.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for net.visual_encoder.backbone.layer2.0.downsample.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for net.visual_encoder.backbone.layer2.1.convs.0.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer3.0.convs.0.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer3.0.downsample.0.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer3.0.downsample.1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for net.visual_encoder.backbone.layer3.0.downsample.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for net.visual_encoder.backbone.layer3.1.convs.0.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer4.0.convs.0.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer4.0.downsample.0.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 512, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer4.0.downsample.1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for net.visual_encoder.backbone.layer4.0.downsample.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for net.visual_encoder.backbone.layer4.1.convs.0.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
	size mismatch for net.visual_encoder.compression.0.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 1024, 3, 3]).

Looks like it loads config for model with resnet50 backbone, but here is my config file:

TRAINER_NAME: "ddppo"
ENV_NAME: "NavRLEnv"
SIMULATOR_GPU_ID: 0
TORCH_GPU_ID: 0
VIDEO_OPTION: []
TENSORBOARD_DIR: "tb"
VIDEO_DIR: "video_dir"
TEST_EPISODE_COUNT: -1
EVAL_CKPT_PATH_DIR: "data/new_checkpoints"
NUM_PROCESSES: 8
SENSORS: ["RGB_SENSOR" , "DEPTH_SENSOR"]
CHECKPOINT_FOLDER: "data/new_checkpoints"
NUM_UPDATES: 1000000
LOG_INTERVAL: 10
CHECKPOINT_INTERVAL: 50

RL:
  SLACK_REWARD: -0.001
  SUCCESS_REWARD: 2.5
  PPO:
    # ppo params
    clip_param: 0.2
    ppo_epoch: 2
    num_mini_batch: 2
    value_loss_coef: 0.5
    entropy_coef: 0.01
    lr: 2.5e-4
    eps: 1e-5
    max_grad_norm: 0.2
    num_steps: 64
    use_gae: True
    gamma: 0.99
    tau: 0.95
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    reward_window_size: 50
    use_normalized_advantage: False

    hidden_size: 512

  DDPPO:
    sync_frac: 0.6
    # The PyTorch distributed backend to use
    distrib_backend: GLOO
    # Visual encoder backbone
    pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
    # Initialize with pretrained weights
    pretrained: False
    # Initialize just the visual encoder backbone with pretrained weights
    pretrained_encoder: False
    # Whether or not the visual encoder backbone will be trained.
    train_encoder: True
    # Whether or not to reset the critic linear layer
    reset_critic: True

    # Model parameters
    backbone: resnet18
    rnn_type: LSTM
    num_recurrent_layers: 2

I am wondering where could be the problem.

1reaction
erikwijmanscommented, May 5, 2020

You can change resnet50 to resnet18 in the config, that will improve training speed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Habitat 2.0 Overview
Here we will detail how to run the Task-Planning with Skills trained via reinforcement learning (TP-SRL) baseline from the Habitat 2.0 paper. This...
Read more >
PointGoal Navigation | Papers With Code
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end ...
Read more >
DD-PPO: LEARNING NEAR-PERFECT POINTGOAL
of an agent equipped with RGB-D and GPS+Compass sensors on the Habitat Challenge 2019 (Savva et al., 2019) train & val sets. Using...
Read more >
Zero-Shot Object-Goal Navigation using Multimodal ... - arXiv
similar or better than the 5% improvement in success between the Habitat 2020 and 2021 ObjectNav challenge winners. In an open-world setting ......
Read more >
Does Evaluation in Simulation Predict Real-World Performance
[1] and the Habitat. Challenge 2019 [13] model the agent as an idealized cylinder of radius 0.1m and height 1.5m. As shown in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found