question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ML1 Tasks for Constant Goals

See original GitHub issue

Currently, we are trying to use specific environments in ML1 to set a goal constant per task in a MAML-setting (with env.reset() meaning that initial positions change but goal stays constant)

However, we are not clear on what a task means in the ML1 setting. Based on the code for one of the environments we are trying to run, it seems like calling self.set_task will update self.goal. However, when the environment is reset, self._state_goal is initially self.goal but is then assigned a randomly generated goal + a concatenation of initial reacher arm positions, which also appears to be random. When self.random_init is False, it works as intended but the starting states are constant.

We wondering if there is a way to define a task using the metaworld API such that for a given task a goal position is held constant but initial observation changes when env.reset() is called.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ryanjuliancommented, Jan 22, 2020

Initial conditions are not randomized in ML1. The important snippet is here: https://github.com/rlworkgroup/metaworld/blob/dfdbc7cf495678ee96b360d1e6e199acc141b36c/metaworld/benchmarks/ml1.py#L22, which sets the constructor arg random_init to False for all environments in the ML1 benchmark.

ML1 varys the goal position, but the diversity your meta-learner is exposed to during meta-training is controlled (it only gets to see 50 unique goals).

Though ML1 measures performance on intra-task variation and ML10/ML45 measure meta-learning performance on inter-task variation, the interfaces are the same. The set_task interface is designed to allow for efficient vectorized sampling: if you want to transmit a meta-batch to remote or vectorized samplers, you can construct the environments once and only transmit the task information to each environment.

You can see this in the ML1 example in the README, when we call

# out loop, meta-batch sampling
env = ML1.get_train_tasks('pick-place-v1')
# sample a meta-batch
tasks = env.sample_tasks(1)  
# configure a single environment to represent a single element of the meta-batch
env.set_task(tasks[0])

# inner-loop, single-task sampling
obs = env.reset()
a = env.action_space.sample() 
obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

Psuedocode for a naive parallelization of meta-batch sampling might look something like this. My example assumes your meta-batch size and your parallelization height (number of environments) is the same.

# setup
env = ML1.get_train_tasks()
meta_batch_size = 10
envs = [pickle.loads(pickle.dumps(env)) for i in range(meta_batch_size)]

for i in range(num_meta_itrs):
    # outer loop, meta-batch sampling
    tasks = env.sample_tasks(meta_batch_size)
    for e, t in zip(envs, tasks):  # parallel-for
        e.set_task(t)

    # inner loop, single-task sampling
    for e in envs:  # parallel-for
        path_length = 0
        obs = e.reset()
        while not done and path_length <= max_path_length:
            a = policy.sample(obs)
            obs, reward, done, info = env.step(a) 

Per-step vectorization would be similar, but there’s a lot more bookkeeping to deal with different vectors terminating at different steps. Meta-test evaluations of ML1 look similar, but with get_test_tasks instead.

ML1 selects a new set of train/test goals each time you construct it by calling ML1.get_train_tasks() or ML1.get_test_tasks(). This can present a problem for multi-process or multi-machine sampling, in which many workers might construct ML1 instances in separate processes, giving them different sets of train/test goals.

Doing this wrong could accidentally expose your meta-learner to far more meta-train configurations than the benchmark allows. I agree that we should probably rethink to API to make this harder to mess up. I recommend that you configure your sampler to stuff the value of task into env_info, and then verify in your optimization process that your samples don’t come from more than 50 unique tasks.

If you’re going to use remote (multi-process or multi-machine) sampling for ML1, you have two options:

  1. Construct as many instances of ML1 in as remote processes as you like, but always sample the meta-batch (call ML1.sample_tasks()) from a single process and transmit the tasks to your workers, which then call env.set_task().
  2. Construct a single ML1 instance (ML1.get_train_tasks()) on a main process and transmit it to worker processes by pickling/unpickling. This preserves the set of train tasks used across machines, which are otherwise chosen-anew every time the benchmark is constructed. The same logic applies to ML1.get_test_tasks(). You may then sample tasks locally, because each worker is sampling among the same pre-chosen set anyway.

Edit: I realized solution (2) doesn’t work with our current pickling implementation (which reconstructs the object anew during unpickling)

0reactions
avnishncommented, Jul 13, 2020

@michaelzhiluo, I believe that we’ve fixed this in our most recent update! We’ve updated the Metaworld API, so for any future projects please make sure to use this new API and update any ongoing projects 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Welcome to the Meta-World. A New Multi-task and ... - Medium
The goal of Multi-task Reinforcement Learning (Multi-task RL) is to learn a single policy that can solve a fixed set of skills more...
Read more >
Exploration in Approximate Hyper-State Space for Meta ... - arXiv
Our goal is to meta-learn policies that maximise expected online return, i.e., optimally trade off exploration and ex- ploitation under task ...
Read more >
Meta-World: A Benchmark and Evaluation for Multi-Task and ...
ML1 uses single Meta-World Tasks, with the meta-training. “tasks” corresponding to 50 random initial object and goal positions, and meta-testing on 10 held-...
Read more >
39 Behavior Goals for an IEP including Work/Task Completion
List of Behavior Goals for an IEP · IEP Behavior Goals by category · On-Task/ Work Completion Goals · Class Participation Goals ·...
Read more >
Goals, Objectives, Strategies and Activities
Understanding Continuous Improvement: Goals, Objectives, Strategies and Activities ... and activities is an essential component of executing a continuous ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found