question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

loss function (In Policy Gradient section), optimizer and entropy

See original GitHub issue

Dear Mr.hongzi I was interested in your resource scheduling method. Now, I stuck in your network class. I can’t understand why you used the blow function: loss = T.log(prob_act[T.arange(N), actions]).dot(values) / N Did you calculate the special loss function? If you didn’t, what’s the name of this loss function?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:12 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
hongzimaocommented, Jun 6, 2020

You are right that rmsprop_updates is a customized function. I guess back at that time standardized library for those optimizers were not available 😃 Things are easier nowadays. And you are right about the gradient operations in tensorflow or pytorch.

1reaction
hongzimaocommented, Jun 3, 2020

Here’s how we computed the advantage Gt, with a time-based baseline: https://github.com/hongzimao/deeprm/blob/master/pg_re.py#L193-L202.

IIRC, RMSProp was slightly more stable than Adam in our experiment. FWIW, A3C original paper also used RMSProp (https://arxiv.org/pdf/1602.01783.pdf see Optimizations in section 4).

The last comment was about different episode termination criteria. It’s their literal meaning I think, ‘no new jobs’ ends the episode when no new jobs are coming and ‘all_done’ only terminates the episode when all jobs (including the unfinished ones when ‘no_new_jobs’ is satisfied) are completed: https://github.com/hongzimao/deeprm/blob/b42eff0ab843c83c2b1b8d44e65f99440fa2a543/environment.py#L255-L265.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding the Impact of Entropy on Policy ... - arXiv
In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We ...
Read more >
Understanding the Impact of Entropy on Policy Optimization
In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first...
Read more >
Policy-Gradient Methods. REINFORCE algorithm
The policy gradient method will iteratively amend the policy network weights (with smooth updates) to make state-action pairs that resulted ...
Read more >
Chapter 10. Reinforcement learning with policy gradients
Policy gradient methods provide a scheme for estimating which direction to shift the weights in order to make the agent better at its...
Read more >
Policy Gradient Algorithms | Lil'Log
Asynchronous Advantage Actor-Critic (Mnih et al., 2016), short for A3C, is a classic policy gradient method with a special focus on parallel ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found