question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[rllib] APEX DQN performance regression?

See original GitHub issue

What is the problem?

It says in the pong_apex.yaml tuned config:

# This can be expected to reach 20.8 reward within an hour when using a V100 GPU
# (e.g. p3.2xl instance on AWS, and m4.4xl workers). It also can reach ~21 reward
# within an hour with fewer workers (e.g. 4-8) but less reliably.

I trained this example on an AWS p3.2xlarge instance (4 workers, 8 vec_env per worker) but could not replicate that statement. It took 4.5 hours of training and 10M timesteps sampled and trained on to reach a mean performance of 19.

But maybe this is just the expected behavior for having less rollout workers? I don’t quite know what the expected # of samples to convergence here is.

For some comparison, training curves for Rainbow in Dopamine show good performance in 10*250k=2.5M timesteps, although certainly the algorithm and hyperparameters aren’t terribly comparable.

Here’s a full record of the run: https://app.wandb.ai/zplizzi/test/runs/2dthszrq?workspace=user-zplizzi

image

Ray version and other system information (Python version, TensorFlow version, OS):

  • Ray nightly wheels as of earlier today
  • Tensorflow 1.14.0
  • Ubuntu 16.04

Reproduction

Here’s the exact script used for training. All parameters are directly from the tuned example:

from ray.rllib.agents.dqn import ApexTrainer
from ray.rllib.agents import dqn

config = dqn.apex.APEX_DEFAULT_CONFIG.copy()

config["env"] = "PongNoFrameskip-v4"
config["monitor"] = True
config["env_config"]["wandb"] = {"project": "test", "monitor_gym": True}

config["target_network_update_freq"] = 50000
config["num_workers"] = 4
config["num_envs_per_worker"] = 8
config["gamma"] = 0.99
config["lr"] = .0001

from ray import tune
from wandb.ray import WandbLogger
tune.run(ApexTrainer,
            loggers=[WandbLogger],
            config=config,
            )

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
zplizzicommented, Dec 12, 2019

Got it, thanks!

For what it’s worth, I re-ran that test with these modified hyperparams (all else the same):

config["target_network_update_freq"] = 20000
config["lr"] = .00005
config["train_batch_size"] = 64

and it’s performing much better (almost done at 5M training steps/1.5M env steps/40 mins on the same machine). But I could imagine that the original hyperparams are better wall-clock for the 32-worker case that it’s designed for.

0reactions
zplizzicommented, Dec 12, 2019
Read more comments on GitHub >

github_iconTop Results From Across the Web

Algorithms — Ray 2.2.0 - the Ray documentation
Defines a configuration class from which a DQN Algorithm can be built. Example. >>> from ray.rllib.algorithms.dqn.dqn import ...
Read more >
Algorithms — Ray 1.13.0
RLlib's multi-GPU optimizer pins that data in GPU memory to avoid unnecessary transfers from host memory, substantially improving performance over a naive ...
Read more >
RLlib Algorithms — Ray 0.8.7 documentation
[paper] [implementation] Ape-X variations of DQN and DDPG (APEX_DQN, APEX_DDPG) use a single GPU learner and many CPU workers for experience collection.
Read more >
Models, Preprocessors, and Action Distributions — Ray 2.2.0
Our DQN model from above takes an observation and outputs one Q-value per (discrete) action. Continuous SAC - on the other hand -...
Read more >
rllib-algorithms.rst.txt - Ray.io
_`APEX-DQN`: rllib-algorithms.html#apex . ... To mitigate these issues, CRR implements a simple and yet powerful idea of "value-filtered regression".
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found