Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some questions about changing policies and observations

See original GitHub issue

Hi, I tried to run and make some changes to the “highway-v0” environment (i.e. no right overtake, safety distance and more…). I now have a question about training. At the moment the model structure is as follows:

model = DQN('MlpPolicy', env, gamma=0.8, learning_rate=5e-4, buffer_size=50000, exploration_fraction=0.1,
            exploration_final_eps=0.5, exploration_initial_eps=1.0, batch_size=32, double_q=True,
            target_network_update_freq=50, prioritized_replay=True, verbose=1, tensorboard_log="./dqn_two_lane_tensorboard/")

and observation type is Kinematics. Results, after training sessions of 300000 steps are fluctuating, also adding layers to Mlp (64, 64, 64, 32, 20) which seems not to add anything to the standard Mlp. So I tried to use Grayscale observation and CnnPolicy, to see if there would be a performance improvement. Here is the code:

model = DQN('CnnPolicy', env, gamma=0.8, learning_rate=5e-4, buffer_size=50000, exploration_fraction=0.1,
            exploration_final_eps=0.5, exploration_initial_eps=1.0, batch_size=32, double_q=True,
            target_network_update_freq=50, prioritized_replay=True, verbose=1, tensorboard_log="./dqn_two_lane_tensorboard/")

"offscreen_rendering": True,
"observation": {
    "type": "GrayscaleObservation",
    "weights": [0.2989, 0.5870, 0.1140],  # weights for RGB conversion
    "stack_size": 4,
    "observation_shape": (screen_width, screen_height)
},
"screen_width": screen_width,
"screen_height": screen_height,
"scaling": 1.75,
"policy_frequency": 2,

The training starts with no errors, but after some steps (around 4000) it crashes due to occupation of all the RAM memory. I tried to reduce batch size (up to 16) and screen width and height (up to 84x84 which is really small) but it doesn’t change anything.

My PC specs are the following: GPU model: NVIDIA Quadro RTX 4000 CUDA version: 10.1 RAM: 32 GB

My question is if there is something I’m missing that causes the RAM saturation and, mostly, if using Cnn + Grayscale observation would actually result in a performance improvement or if it’s a waste of time. Thanks in advance for your help

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:13 (7 by maintainers)

Top GitHub Comments

3reactions

lucalazzaronicommented, Mar 21, 2021

It works with no memory leak! I am really grateful to you for the help you have given me. You have been very helpful, thank you so much. Can close the issue.

2reactions

eleurentcommented, Mar 21, 2021

Yes indeed, I run into that issue as well (the channel convention was WxHxC instead of CxWxH as required by sb3) and fixed it. Will push very soon, I’m finishing the last changes (having two separate viewers for env rendering and image observation)

Top Results From Across the Web

50+ Change Management Questions to Ask During a Change ...

Questions that assess employee attitudes, behaviors, and culture. How accepting will employees be of the proposed change? How will the change ...

Section 6. Promoting Community-Friendly Policies in Business ...

Learn how to change policies to increase funding for community health and development initiatives.

The Nature of Policy Change and Implementation - OECD

Thus the main question is: how can we explain policy change and implementation? While there is a growing body of literature on policy...

How to Deal With Resistance to Change

The steps include emphasizing new standards of performance for staff specialists and encouraging them to think in different ways, as well as making...

6 Policies to Support Community Solutions

There are also policy changes that could be made at the federal and state level ... in question, and related data necessary for...