question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

Why the result is not better than MPC?

See original GitHub issue

Hi Hongzi,

I tried to reproduce the result of Pensieve. After several attempts, I failed to get an ideal result (better performance than MPC). The following is the way I used. The code was downloaded from GitHub, and the trace files were got from Dropbox:

  1. Put training data (train_sim_traces) in sim/cooked_traces and testing data (test_sim_traces) in sim/cooked_test_traces;
  2. Run python multi_agent.py to train the model;
  3. Copy the generated model files to test/model, and modify the model name in test/rl_no_training.py;
  4. Run python rl_no_training.py in test/ folder to test the model, trace files in test_sim_traces are also used;
  5. Run python plot_results.py to compare the results with DP method & MPC method.

I put two figures of total_reward and CDF here. We can see the performance of Pensieve is not better than MPC. figure_4 figure_1-4

Here is a figure of tensorboard. The training step is about 160,000. screenshot-2017-11-1 tensorboard

I found the result is not very stable after long time training (more than 10,000). Thus the trained models bring different performance when testing. For example, the model of 164500 steps got a reward of 35.2, while the model of 164600 steps got a reward of 33.7.

Did I do something wrong, so that I couldnā€™t get the same result as you described in the paper? The pretrain_linear_reward model performs good. How do you get it? Can you give me a hand to solve these questions, any answer is highly appreciated.

Thanks!

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
fangxiaorancommented, Nov 10, 2017

Sure. Iā€™ll try to use placeholder and post my result if it works. The following is the current result.

  • ENTROPY_WEIGHT = 5, 1~20000 epochs image
  • ENTROPY_WEIGHT = 1, 20001~40000 epochs image
  • ENTROPY_WEIGHT = 0.5, 40001~80000 epochs image
  • ENTROPY_WEIGHT = 0.3, 80001~100000 epochs image
  • ENTROPY_WEIGHT = 0.1, 100001~120000 epochs image
1reaction
hongzimaocommented, Nov 9, 2017

Iā€™m glad you got the good performance šŸ‘

As for automatically decaying the exploration factor, notice that ENTROPY_WEIGHT sets a constant in tensorflow computation graph (e.g., https://github.com/hongzimao/pensieve/blob/master/sim/a3c.py#L47-L52). To make it tunable during execution, you need to specify a tensorflow placeholder and set its value each time.

I think any reasonable decay function should work (e.g., linear, step function, etc.). If you manage to get that work, could you post your result (maybe open another issue)? Although we have our internal implementation (we didnā€™t post it because (1) itā€™s fairly easy to implement and (2) more importantly we intentionally want others to observe this effect), we would appreciate a lot if someone can reproduce and improve our implementation. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Marginal Propensity to Consume (MPC) in Economics, With ...
This calculation is important because MPC is not constant; it varies by income level. Typically, the higher the income, the lower the MPC...
Read more >
Did Akai fix the "cons"? MPC LIVE II vs MPC ONE ... - YouTube
Here's my review of MPC Live 2 - Get my constantly expanding book of synth and electronic music ideas, tips and tricks here:ā–»...
Read more >
Is Marathon Petroleum Corp (MPC) Stock Over or Undervalued?
MPC holds a better value than 10% of stocks at its current price. Investors who are focused on long-term growth through buy-and-holdĀ ...
Read more >
Review on model predictive control: an engineering perspective
In contrast to what have been claimed, [35] stressed that MPC is not inherently more or less robust than classic feedback control (e.g....
Read more >
Long-term experience of MPC across multiple TrueBeam linacs
The variation in MPC output versus chamber measurement indicates MPC is appropriate for daily output constancy but not for the measurement ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found