Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

loss function (In Policy Gradient section), optimizer and entropy

See original GitHub issue

Dear Mr.hongzi I was interested in your resource scheduling method. Now, I stuck in your network class. I can’t understand why you used the blow function: loss = T.log(prob_act[T.arange(N), actions]).dot(values) / N Did you calculate the special loss function? If you didn’t, what’s the name of this loss function?

Issue Analytics

State:
Created 3 years ago
Comments:12 (4 by maintainers)

Top GitHub Comments

1reaction

hongzimaocommented, Jun 6, 2020

You are right that rmsprop_updates is a customized function. I guess back at that time standardized library for those optimizers were not available 😃 Things are easier nowadays. And you are right about the gradient operations in tensorflow or pytorch.

1reaction

hongzimaocommented, Jun 3, 2020

Here’s how we computed the advantage Gt, with a time-based baseline: https://github.com/hongzimao/deeprm/blob/master/pg_re.py#L193-L202.

IIRC, RMSProp was slightly more stable than Adam in our experiment. FWIW, A3C original paper also used RMSProp (https://arxiv.org/pdf/1602.01783.pdf see Optimizations in section 4).

The last comment was about different episode termination criteria. It’s their literal meaning I think, ‘no new jobs’ ends the episode when no new jobs are coming and ‘all_done’ only terminates the episode when all jobs (including the unfinished ones when ‘no_new_jobs’ is satisfied) are completed: https://github.com/hongzimao/deeprm/blob/b42eff0ab843c83c2b1b8d44e65f99440fa2a543/environment.py#L255-L265.

Top Results From Across the Web

Understanding the Impact of Entropy on Policy ... - arXiv

In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We ...

Understanding the Impact of Entropy on Policy Optimization

In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first...

Policy-Gradient Methods. REINFORCE algorithm

The policy gradient method will iteratively amend the policy network weights (with smooth updates) to make state-action pairs that resulted ...

Chapter 10. Reinforcement learning with policy gradients

Policy gradient methods provide a scheme for estimating which direction to shift the weights in order to make the agent better at its...

Policy Gradient Algorithms | Lil'Log

Asynchronous Advantage Actor-Critic (Mnih et al., 2016), short for A3C, is a classic policy gradient method with a special focus on parallel ...