question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[rllib] Persisting Arbitrary Data Between Timesteps

See original GitHub issue

I have multiple game-playing agents hooked up to a model that spits out both their next move as well as a vector of symbols to ‘communicate’ with their fellow agents. I plan to build out a custom policy that calculates an intrinsic reward based on the interplay between actions taken this timestep and symbols received last timestep.

What I’m struggling with is the right way to persist this bag of communication vectors; while calculating the reward for a given agent I’d need the communication vectors passed around from the last timestep.

I’ve been considering adding the symbol emission to my action space so my environment’s step function can hold all the vectors (possibly in prev_actions), alternatively it seems like one could use callbacks such as on_episode_start to hold the required data. I’m not sure what the best practice for this kind of data-passing would be.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ericlcommented, Feb 25, 2020

I see, I think that would probably be best recorded as a custom metric. There’s a few ways to do it but env could return these reward breakdowns in the info return from the env, and the callback can retrieve it from the rollout batch in on_postprocess_traj: https://ray.readthedocs.io/en/latest/rllib-training.html#callbacks-and-custom-metrics

1reaction
ericlcommented, Feb 25, 2020

Can it be emitted as an action and included as part of the observation of agents in the next timestep? The env would have to do this internally.

For calculating the rewards, it sounds like you can do it in the env as usual, if you save the last action/symbols, or it could also be done in a on_postprocess_traj callback where you have the opportunity to rewrite the entire rollout sequence if needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sample Collections and Trajectory Views — Ray 2.2.0
RLlib's default SampleCollector class is the SimpleListCollector , which appends single timestep data (e.g. actions) to lists, then builds SampleBatches from ...
Read more >
Learning Ray
Flexible Distributed Python for Data Science ... This work is part of a collaboration between O'Reilly and Anyscale. ... Working With RLlib Environments....
Read more >
ch14 Recurrent NNs.md · Scikit and Tensorflow Workbooks ...
Use case: arbitrary-length sequence data analysis - anticipation ... A network node that preserves state across time is called a cell (memory cell)....
Read more >
Physics-based Deep Learning - arXiv
Beyond standard supervised learning from data, we'll look at physical loss constraints, more tightly coupled learning algorithms with differentiable sim-.
Read more >
Scalable Reinforcement Learning Systems and their ...
We synthesize the lessons learned in RLlib, a widely adopted open source library for scalable ... from which data items can be pulled...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found