Some questions about changing policies and observations
See original GitHub issueHi, I tried to run and make some changes to the “highway-v0” environment (i.e. no right overtake, safety distance and more…). I now have a question about training. At the moment the model structure is as follows:
model = DQN('MlpPolicy', env, gamma=0.8, learning_rate=5e-4, buffer_size=50000, exploration_fraction=0.1,
exploration_final_eps=0.5, exploration_initial_eps=1.0, batch_size=32, double_q=True,
target_network_update_freq=50, prioritized_replay=True, verbose=1, tensorboard_log="./dqn_two_lane_tensorboard/")
and observation type is Kinematics. Results, after training sessions of 300000 steps are fluctuating, also adding layers to Mlp (64, 64, 64, 32, 20) which seems not to add anything to the standard Mlp. So I tried to use Grayscale observation and CnnPolicy, to see if there would be a performance improvement. Here is the code:
model = DQN('CnnPolicy', env, gamma=0.8, learning_rate=5e-4, buffer_size=50000, exploration_fraction=0.1,
exploration_final_eps=0.5, exploration_initial_eps=1.0, batch_size=32, double_q=True,
target_network_update_freq=50, prioritized_replay=True, verbose=1, tensorboard_log="./dqn_two_lane_tensorboard/")
"offscreen_rendering": True,
"observation": {
"type": "GrayscaleObservation",
"weights": [0.2989, 0.5870, 0.1140], # weights for RGB conversion
"stack_size": 4,
"observation_shape": (screen_width, screen_height)
},
"screen_width": screen_width,
"screen_height": screen_height,
"scaling": 1.75,
"policy_frequency": 2,
The training starts with no errors, but after some steps (around 4000) it crashes due to occupation of all the RAM memory. I tried to reduce batch size (up to 16) and screen width and height (up to 84x84 which is really small) but it doesn’t change anything.
My PC specs are the following: GPU model: NVIDIA Quadro RTX 4000 CUDA version: 10.1 RAM: 32 GB
My question is if there is something I’m missing that causes the RAM saturation and, mostly, if using Cnn + Grayscale observation would actually result in a performance improvement or if it’s a waste of time. Thanks in advance for your help
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:13 (7 by maintainers)
Top GitHub Comments
It works with no memory leak! I am really grateful to you for the help you have given me. You have been very helpful, thank you so much. Can close the issue.
Yes indeed, I run into that issue as well (the channel convention was WxHxC instead of CxWxH as required by sb3) and fixed it. Will push very soon, I’m finishing the last changes (having two separate viewers for env rendering and image observation)