Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Action Discrete(5) and reward in "simple_tag" env

See original GitHub issue

The action for each agent is Discrete(5). However actually it is Box(5) within (-1, 1). The code here agent.action.u[0] += action[0][1] - action[0][2] agent.action.u[1] += action[0][3] - action[0][4] is used to get p_force and then to get p_vel, So what does action[0][0] do?
The reward of adversary agents for each step is based on is_collision which turns out to be the same reward for each adversary agent even if we consider the penalty in the case shape = True. How is it different from self.shared_reward = True in environment.py?

I don’t mean to complain, just wonder how it works. Appreciate it if you guys could answer me.

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:5 (1 by maintainers)

Top GitHub Comments

7reactions

tebba-von-mathensteincommented, Jan 15, 2018

@NorthernWolf, I’m not a maintainer/author but I was playing around with it this morning and I think I have a simple example that you can use to give all agents in the environment a random action for any of these environments, just replace make_env('simple_push') with the name of the scenario you want to watch:

from make_env import make_env
import numpy as np

env = make_env('simple_push')

for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        agent_actions = []
        for i, agent in enumerate(env.world.agents):
            # This is a Discrete
            # https://github.com/openai/gym/blob/master/gym/spaces/discrete.py
            agent_action_space = env.action_space[i]

            # Sample returns an int from 0 to agent_action_space.n
            action = agent_action_space.sample()

            # Environment expects a vector with length == agent_action_space.n
            # containing 0 or 1 for each action, 1 meaning take this action
            action_vec = np.zeros(agent_action_space.n)
            action_vec[action] = 1
            agent_actions.append(action_vec)

        # Each of these is a vector parallel to env.world.agents, as is agent_actions
        observation, reward, done, info = env.step(agent_actions)
        print (observation)
        print (reward)
        print (done)
        print (info)
        print()

Hope it helps!

3reactions

Haoxiang-Wangcommented, Nov 13, 2017

Agree. If you OpenAI guys can release a simple example of random agents in all environment, then it will be a great relief. Hope there will be a explanation of the action space and how to take action in different environments, since it’s quiet confusing. Thank you.

Top Results From Across the Web

Environment Detail - 及第 Jidi

The reward for agent is the sum of scaled distances to the adversary and subtracts 10 if colliding with the adversary. The reward...

Simple Tag - PettingZoo Documentation

This is a predator-prey environment. Good agents (green) are faster and receive a negative reward for being hit by adversaries (red) (-10 for...

Code for a multi-agent particle environment used in ...

A simple multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics. Used in the paper...

Multi-Agent Reinforcement Learning: OpenAI's MADDPG

Adversary is rewarded based on how close it is to the target, but it doesn't know which landmark is the target landmark. So...

FACMAC: Factored Multi-Agent Centralised Policy Gradients

FACMAC outperforms MADDPG and other baselines in both discrete and continuous action tasks. Figure 5 and 6 illustrate the mean episode return ...