ML1 Tasks for Constant Goals
See original GitHub issueCurrently, we are trying to use specific environments in ML1 to set a goal constant per task in a MAML-setting (with env.reset()
meaning that initial positions change but goal stays constant)
However, we are not clear on what a task means in the ML1 setting. Based on the code for one of the environments we are trying to run, it seems like calling self.set_task
will update self.goal
. However, when the environment is reset, self._state_goal
is initially self.goal
but is then assigned a randomly generated goal + a concatenation of initial reacher arm positions, which also appears to be random. When self.random_init
is False, it works as intended but the starting states are constant.
We wondering if there is a way to define a task using the metaworld API such that for a given task a goal position is held constant but initial observation changes when env.reset()
is called.
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
Initial conditions are not randomized in ML1. The important snippet is here: https://github.com/rlworkgroup/metaworld/blob/dfdbc7cf495678ee96b360d1e6e199acc141b36c/metaworld/benchmarks/ml1.py#L22, which sets the constructor arg
random_init
toFalse
for all environments in the ML1 benchmark.ML1 varys the goal position, but the diversity your meta-learner is exposed to during meta-training is controlled (it only gets to see 50 unique goals).
Though ML1 measures performance on intra-task variation and ML10/ML45 measure meta-learning performance on inter-task variation, the interfaces are the same. The
set_task
interface is designed to allow for efficient vectorized sampling: if you want to transmit a meta-batch to remote or vectorized samplers, you can construct the environments once and only transmit the task information to each environment.You can see this in the ML1 example in the README, when we call
Psuedocode for a naive parallelization of meta-batch sampling might look something like this. My example assumes your meta-batch size and your parallelization height (number of environments) is the same.
Per-step vectorization would be similar, but there’s a lot more bookkeeping to deal with different vectors terminating at different steps. Meta-test evaluations of ML1 look similar, but with
get_test_tasks
instead.ML1 selects a new set of train/test goals each time you construct it by calling
ML1.get_train_tasks()
orML1.get_test_tasks()
. This can present a problem for multi-process or multi-machine sampling, in which many workers might construct ML1 instances in separate processes, giving them different sets of train/test goals.Doing this wrong could accidentally expose your meta-learner to far more meta-train configurations than the benchmark allows. I agree that we should probably rethink to API to make this harder to mess up. I recommend that you configure your sampler to stuff the value of
task
intoenv_info
, and then verify in your optimization process that your samples don’t come from more than 50 unique tasks.If you’re going to use remote (multi-process or multi-machine) sampling for ML1, you have two options:
ML1.sample_tasks()
) from a single process and transmit the tasks to your workers, which then callenv.set_task()
.Construct a single ML1 instance (ML1.get_train_tasks()
) on a main process and transmit it to worker processes by pickling/unpickling. This preserves the set of train tasks used across machines, which are otherwise chosen-anew every time the benchmark is constructed. The same logic applies toML1.get_test_tasks()
. You may then sample tasks locally, because each worker is sampling among the same pre-chosen set anyway.Edit: I realized solution (2) doesn’t work with our current pickling implementation (which reconstructs the object anew during unpickling)
@michaelzhiluo, I believe that we’ve fixed this in our most recent update! We’ve updated the Metaworld API, so for any future projects please make sure to use this new API and update any ongoing projects 😃