Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Result on collaborative_cooking_impassable_0 has very large variance

See original GitHub issue

Hi all, I ran experiments on collaborative_cooking_impassable_0 and the following is the evaluation result. It seems it has large variance. In your paper, the best result is 268 (shown below, you can find it in page 28 of the paper). What is the variance of the result? Is my result normal? Will you share the variance of the results in the future release?

Issue Analytics

State:
Created a year ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

YetAnotherPolicycommented, Jul 22, 2022

For this scenario, the trained bot does a lot by itself (but does a “one-pot”) strategy. Ideally, your submitted population should learn to “help out” by running a two-pot strategy, or “ferrying” tomatoes to near the pot etc. (Watch videos of your agents to get an intuition of their strategy).

However, on this scenario, our exploiters are weak and don’t learn such strategies. So actually I expect it will be possible to convincingly beat them on this (I think a two-pot strategy should achieve at least 200). The reason our exploiters are weak on this scenario is because it’s the same problem as for the non-exploiter case but for N-1 which is still >1.

One way the N-player case is hard is that there’s a credit-assignment issue with shared rewards and partial-observability. Consider: bot A drops food at the pass, agent B gets the shared reward, but agent B can’t see bot A. So B may falsely conclude that the reward is random, and that it’s actions have no impact. So B may learn to do nothing (a bit like “learned helplessness”). See the paper on “lazy agents” for discussion on this (end of section 7).

It’s a fun problem!

@jagapiou Hey, thanks for the informative reply. After playing with collaborative_cooking_impassable, I noticed the credit-assignment issue too. Agents should identify the right action which exactly triggered the final reward (the ready soup) after many timestep delays. It is a great property for MARL research. It is cool and fun😁

1reaction

jagapioucommented, Jul 22, 2022

For this scenario, the trained bot does a lot by itself (but does a “one-pot”) strategy. Ideally, your submitted population should learn to “help out” by running a two-pot strategy, or “ferrying” tomatoes to near the pot etc. (Watch videos of your agents to get an intuition of their strategy).

However, on this scenario, our exploiters are weak and don’t learn such strategies. So actually I expect it will be possible to convincingly beat them on this (I think a two-pot strategy should achieve at least 200). The reason our exploiters are weak on this scenario is because it’s the same problem as for the non-exploiter case but for N-1 which is still >1.

One way the N-player case is hard is that there’s a credit-assignment issue with shared rewards and partial-observability. Consider: bot A drops food at the pass, agent B gets the shared reward, but agent B can’t see bot A. So B may falsely conclude that the reward is random, and that it’s actions have no impact. So B may learn to do nothing (a bit like “learned helplessness”). See the Melting Pot paper on “lazy agents” for discussion on this (end of section 7).

It’s a fun problem!

Top Results From Across the Web

Distributed team collaboration in a computer mediated task

One of the key factors associated with collaboration in military teams is situational awareness. This research used a commercial command and control type ......

Technical Report Number 11 “ Beaufort Sea Region Socioeconomic ...

the BLM has initiated several investigative programs, one of which is the Alaska OCS Socioeconomic Studies Program. The Alaska OCS Socioeconomic Studies ...

CITY OF ALAMEDA PLANNING BOARD - AlamedaCA.gov

NOW THEREFORE, BE IT RESOLVED, that the Planning Board finds this project categorically exempt from environmental review pursuant to CEQA Guidelines Section ...

(PDF) Life history and environmental variation interact to determine ...

Theory indicates that genetic diversity is lost at a rate inversely proportional to the genetically effective population size (N(e)), which is roughly equal...

(PDF) Food insecurity as a supply chain problem. Evidence and ...

Postharvest Losses and their Determinants: A Challenge to Creating a Sustainable Cooking Banana Value Chain in Uganda. by Dietmar Stoian.