[question] How to use a trained agent in a production setting using a custom environment?
See original GitHub issueFor example, if I created a custom environment for tic-tac-toe and trained an agent on it. How do I actually use the trained agent in a live setting? My current workflow is:
- Load the custom environment with the current observation
obs = env.reset()
- Get an action from
model.predict(obs)
optionally includestate = state
for recurrent policies - Manually perform the action, and observe the next step and save the latest state ( so there would not be a
env.step()
since there’s no further observations) - Create a new environment with the latest observation and repeat this process
Is there a better way to actually use the agent to perform a task without continuously redefining a new environment?
Optionally, is there a way to incorporate online learning into this process? Such that I can calculate a reward and use that to train the agent for additional steps based on the live feedback?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7
Top Results From Across the Web
Tensorforce - Agent Training with Custom Environment
I am currently working on a university reinforcement learning project with Tensorforce. The setting is as follows: In a production line are ...
Read more >How to move reinforcement learning model into production?
I have trained reinforcement learning agent on a custom environment using the DQN technique. The ...
Read more >Building a Custom Environment for Deep Reinforcement ...
Tired of working with standard OpenAI Environments ?Want to get started building your own custom Reinforcement Learning Environments ?
Read more >Environments | TensorFlow Agents
The goal of Reinforcement Learning (RL) is to design agents that learn by interacting with an environment. In the standard RL setting, ...
Read more >Environments — Ray 2.2.0
For a full runnable code example using the custom environment API, ... training), and (3) You define a function that maps an env-produced...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I do not quite understand the “create a new environment”. If your environment follows Gym API, you can do the following (works for recurrent policies too):
Online learning: Also discussed in #466 , stable-baselines does not support individual update steps and there are no plans on including it. However in your case you could try using
learn
function to achieve this, like so:learn()
step
, the environment executes agent’s action and then asks human (or other) player for their action, and executes that.This way the other players are “part of the environment” on which the agent learns. This self-play environment uses same approach. Note how
player2
actions are done instep
andreset
functions.Ah alright, so the final environment is different from tic-tac-toe. I can not give you right answers here on what would work best as this is the “research” part of RL: You have to try out things yourself and see what works best, unless you find references for this (I do not know any).
On resetting: It depends on what your environment is like. I recommend reading on “trajectories” and “terminal states”, e.g. from Spinning Up tutorials.