Using Saved Model as Enemy Policy in Custom Environment (while training in a subprocvecenv)
See original GitHub issueI am currently training in an environment that has multiple agents. In this case there are multiple snakes all on the same 11x11 grid moving around and eating food. There is one “player” snake and three “enemy” snakes. Every 1 million training steps I want to save the player model and update the enemies in such a way that they use that model to make their movements.
I can (sort of) do this in a SubprocVecEnv by importing tensorflow each time I call the update policy method of the Snake game class (which inherits from the gym environment class).
def set_enemy_policy(self, policy_path):
import tensorflow as tf
tf.reset_default_graph()
self.ENEMY_POLICY = A2C.load(policy_path)
I consider this a hackish method because it imports tensorflow into each child process (of the subprocvecenv) every time the enemy policy is updated.
I use this hackish approach because I cannot simply pass in the model=A2C.load(policy_path)
into some sort of callback as these models can’t be pickled.
Is there a standard solution for this sort of problem?
Issue Analytics
- State:
- Created 3 years ago
- Comments:16
Top GitHub Comments
The way we have it set up is that there are multiple players per environment. The observation and action spaces are n-tuples where n is the number of player.
CurryVecEnv
extracts the appropriate observation tuple and feeds it into the policy, then collects actions from all the players and reconstructs a full action tuple.So each environment is still independent. We just use multiple environments to speed up training.
@lukepolson
Yeah, my solution is not very optimized as there are separate agents for each env, each running at their own paces. The
predict
is not slow compared to train, but it does slow gathering samples when there is an agent involved (I needed this for my own experiments, though). And indeed, it runs out of VRAM quickly if you use CNNs.I recommend going the Adam’s way here, and try to create one agent that
predict
s for bunch of environments at once (should have done this myself, too 😂 ).