How to compute custom metrics during training ?
See original GitHub issueHello,
In many of your papers you usually compute many metrics during the training process, i.e equality, sustainability, etc. I am trying to compute these metrics for one of your substrates using RLlib. According to This official tutorial, you can use custom callbacks to achieve that.
Theoretically, the on_episode_step
callback has a base_env
parameter that stores the environment information, and you can simply access the environment data using obs, rewards, dones, infos, off_policy_actions = base_env.poll()
In the case of allelopathic_harvest some useful information is stored as WORLD observations, like in the case of WORLD.WHO_ZAPPED_WHO. However, since the “WORLD” observations are deleted in timestep_to_observations I can not use this data in the RLlib callback.
I tried to send data using the info variable in this script as show below.
def step(self, action):
"""See base class."""
actions = [action[agent_id] for agent_id in self._ordered_agent_ids]
timestep = self._env.step(actions)
rewards = {
agent_id: timestep.reward[index]
for index, agent_id in enumerate(self._ordered_agent_ids)
}
done = {'__all__': True if timestep.last() else False}
info = {"player_0": {"__common__ ": "test"}}
observations = _timestep_to_observations(timestep)
return observations, rewards, done, info
But in the callback side I just get the following output. (RolloutWorker pid=21172) {0: {}}
. Additionally, I noticed that the rewards obtained in the callback side using base_env.poll()
are also empty.
Do you know how can we compute these metrics? I am aware that you don’t use RLlib internally in DeepMind, however I consider that this is the best place to ask.
Issue Analytics
- State:
- Created a year ago
- Comments:7
Thank you !
I think that the code that I posted here makes the job.
So I will close the Issue !
Great!!!