question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] How to use a trained agent in a production setting using a custom environment?

See original GitHub issue

For example, if I created a custom environment for tic-tac-toe and trained an agent on it. How do I actually use the trained agent in a live setting? My current workflow is:

  1. Load the custom environment with the current observation
  2. obs = env.reset()
  3. Get an action from model.predict(obs) optionally include state = state for recurrent policies
  4. Manually perform the action, and observe the next step and save the latest state ( so there would not be a env.step() since there’s no further observations)
  5. Create a new environment with the latest observation and repeat this process

Is there a better way to actually use the agent to perform a task without continuously redefining a new environment?

Optionally, is there a way to incorporate online learning into this process? Such that I can calculate a reward and use that to train the agent for additional steps based on the live feedback?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
Miffylicommented, Sep 18, 2019

I do not quite understand the “create a new environment”. If your environment follows Gym API, you can do the following (works for recurrent policies too):

env = env.make("your_env_here")
agent = PPO2.load(path_to_model)
state = None
done = False
obs = env.reset()

# Play some limited number of steps
for i in range(1000):
    action, state = agent.predict(obs, state=state, mask=done)
    obs, reward, done, info = env.step(action)
    if done: 
        # Game over, reset env
        # and agent's hidden state
        state = None
        obs = env.reset()

Online learning: Also discussed in #466 , stable-baselines does not support individual update steps and there are no plans on including it. However in your case you could try using learn function to achieve this, like so:

  1. Load/initialize agent, create your tic-tac-toe environment
  2. Start agent learning with learn()
  3. Every call to environment’s step, the environment executes agent’s action and then asks human (or other) player for their action, and executes that.

This way the other players are “part of the environment” on which the agent learns. This self-play environment uses same approach. Note how player2 actions are done in step and reset functions.

0reactions
Miffylicommented, Sep 18, 2019

Ah alright, so the final environment is different from tic-tac-toe. I can not give you right answers here on what would work best as this is the “research” part of RL: You have to try out things yourself and see what works best, unless you find references for this (I do not know any).

On resetting: It depends on what your environment is like. I recommend reading on “trajectories” and “terminal states”, e.g. from Spinning Up tutorials.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorforce - Agent Training with Custom Environment
I am currently working on a university reinforcement learning project with Tensorforce. The setting is as follows: In a production line are ...
Read more >
How to move reinforcement learning model into production?
I have trained reinforcement learning agent on a custom environment using the DQN technique. The ...
Read more >
Building a Custom Environment for Deep Reinforcement ...
Tired of working with standard OpenAI Environments ?Want to get started building your own custom Reinforcement Learning Environments ?
Read more >
Environments | TensorFlow Agents
The goal of Reinforcement Learning (RL) is to design agents that learn by interacting with an environment. In the standard RL setting, ...
Read more >
Environments — Ray 2.2.0
For a full runnable code example using the custom environment API, ... training), and (3) You define a function that maps an env-produced...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found