[Question] Thoughts on using Ax for stochastic trials
See original GitHub issueThis is a fairly open an vague question apologies, but I’d like to see if anyone has thoughts on using the approach implemented by Ax for “stochastic trials”, or stated another way “policy learning”.
What I mean by this is instead of running a trial with a fixed set of parameters, it is common in reinforcement learning to deploy a stochastic policy (for example a policy defined to output a value from a parameterized distribution). This allows for applying algorithms like REINFORCE.
I am trying to understand if there is a bridge between these two approaches which is useful. One way of modelling this would be to just have the Trial
define the parameters of a stochastic policy. This seems okay, but applying the rest of the BO toolbox to this kind of data seems tricky.
Any thoughts from people using Ax?
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Cool. Closing this out for now then, feel free to reopen if needed.
You’re completely right. Evaluation is cheap, but potential correlated across evaluations (which is why BO was the initial thought).
Interesting thought on reward shaping. I’ll need to think a bit more about it. It sounds very sensible but perhaps not immediately necessary.