Result on collaborative_cooking_impassable_0 has very large variance
See original GitHub issueHi all, I ran experiments on collaborative_cooking_impassable_0
and the following is the evaluation result. It seems it has large variance. In your paper, the best result is 268 (shown below, you can find it in page 28 of the paper). What is the variance of the result? Is my result normal? Will you share the variance of the results in the future release?


Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Distributed team collaboration in a computer mediated task
One of the key factors associated with collaboration in military teams is situational awareness. This research used a commercial command and control type ......
Read more >Technical Report Number 11 “ Beaufort Sea Region Socioeconomic ...
the BLM has initiated several investigative programs, one of which is the Alaska OCS Socioeconomic Studies Program. The Alaska OCS Socioeconomic Studies ...
Read more >CITY OF ALAMEDA PLANNING BOARD - AlamedaCA.gov
NOW THEREFORE, BE IT RESOLVED, that the Planning Board finds this project categorically exempt from environmental review pursuant to CEQA Guidelines Section ...
Read more >(PDF) Life history and environmental variation interact to determine ...
Theory indicates that genetic diversity is lost at a rate inversely proportional to the genetically effective population size (N(e)), which is roughly equal...
Read more >(PDF) Food insecurity as a supply chain problem. Evidence and ...
Postharvest Losses and their Determinants: A Challenge to Creating a Sustainable Cooking Banana Value Chain in Uganda. by Dietmar Stoian.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@jagapiou Hey, thanks for the informative reply. After playing with
collaborative_cooking_impassable
, I noticed the credit-assignment issue too. Agents should identify the right action which exactly triggered the final reward (the ready soup) after many timestep delays. It is a great property for MARL research. It is cool and fun😁For this scenario, the trained bot does a lot by itself (but does a “one-pot”) strategy. Ideally, your submitted population should learn to “help out” by running a two-pot strategy, or “ferrying” tomatoes to near the pot etc. (Watch videos of your agents to get an intuition of their strategy).
However, on this scenario, our exploiters are weak and don’t learn such strategies. So actually I expect it will be possible to convincingly beat them on this (I think a two-pot strategy should achieve at least 200). The reason our exploiters are weak on this scenario is because it’s the same problem as for the non-exploiter case but for
N-1
which is still>1
.One way the
N
-player case is hard is that there’s a credit-assignment issue with shared rewards and partial-observability. Consider: bot A drops food at the pass, agent B gets the shared reward, but agent B can’t see bot A. So B may falsely conclude that the reward is random, and that it’s actions have no impact. So B may learn to do nothing (a bit like “learned helplessness”). See the Melting Pot paper on “lazy agents” for discussion on this (end of section 7).It’s a fun problem!