Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[rllib] Persisting Arbitrary Data Between Timesteps

See original GitHub issue

I have multiple game-playing agents hooked up to a model that spits out both their next move as well as a vector of symbols to ‘communicate’ with their fellow agents. I plan to build out a custom policy that calculates an intrinsic reward based on the interplay between actions taken this timestep and symbols received last timestep.

What I’m struggling with is the right way to persist this bag of communication vectors; while calculating the reward for a given agent I’d need the communication vectors passed around from the last timestep.

I’ve been considering adding the symbol emission to my action space so my environment’s step function can hold all the vectors (possibly in prev_actions), alternatively it seems like one could use callbacks such as on_episode_start to hold the required data. I’m not sure what the best practice for this kind of data-passing would be.

Issue Analytics

State:
Created 4 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

ericlcommented, Feb 25, 2020

I see, I think that would probably be best recorded as a custom metric. There’s a few ways to do it but env could return these reward breakdowns in the info return from the env, and the callback can retrieve it from the rollout batch in on_postprocess_traj: https://ray.readthedocs.io/en/latest/rllib-training.html#callbacks-and-custom-metrics

1reaction

ericlcommented, Feb 25, 2020

Can it be emitted as an action and included as part of the observation of agents in the next timestep? The env would have to do this internally.

For calculating the rewards, it sounds like you can do it in the env as usual, if you save the last action/symbols, or it could also be done in a on_postprocess_traj callback where you have the opportunity to rewrite the entire rollout sequence if needed.

Top Results From Across the Web

Sample Collections and Trajectory Views — Ray 2.2.0

RLlib's default SampleCollector class is the SimpleListCollector , which appends single timestep data (e.g. actions) to lists, then builds SampleBatches from ...

Learning Ray

Flexible Distributed Python for Data Science ... This work is part of a collaboration between O'Reilly and Anyscale. ... Working With RLlib Environments....

ch14 Recurrent NNs.md · Scikit and Tensorflow Workbooks ...

Use case: arbitrary-length sequence data analysis - anticipation ... A network node that preserves state across time is called a cell (memory cell)....

Physics-based Deep Learning - arXiv

Beyond standard supervised learning from data, we'll look at physical loss constraints, more tightly coupled learning algorithms with differentiable sim-.

Scalable Reinforcement Learning Systems and their ...

We synthesize the lessons learned in RLlib, a widely adopted open source library for scalable ... from which data items can be pulled...