question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[enhancement] Polyak Averaging could be done faster

See original GitHub issue

This is rather minor, but polyak averaging in DQN/SAC/TD3 could be done faster with far fewer intermediate tensors using torch.addcmul_ https://pytorch.org/docs/stable/torch.html#torch.addcmul.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
PartiallyTypedcommented, Jul 9, 2020

I tested these on sac, there is a good 1.5-1.8 speedup here. More on the GPU than the cpu because of data transfers.

def fast_polyak(agent):
  one = th.ones(1, requires_grad=False).to(agent.device)
  for param, target_param in zip(agent.critic.parameters(), agent.critic_target.parameters()):
    target_param.data.mul_(1-agent.tau)
    target_param.data.addcmul_(param.data, one, value=agent.tau)

def slow_polyak(agent):
  for param, target_param in zip(agent.critic.parameters(), agent.critic_target.parameters()):
    target_param.data.copy_((1-agent.tau)*target_param.data + agent.tau*param.data)

# how openai does it in their codebase
def openai_polyak(agent):
  for param, target_param in zip(agent.critic.parameters(), agent.critic_target.parameters()):
    target_param.data.mul_(1-agent.tau)
    target_param.data.add_(agent.tau*param.data)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32,32])).learn(1000)
%timeit  for _ in range(10): fast_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32,32])).learn(1000)
%timeit for _ in range(10):  slow_polyak(agent) 
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32,32])).learn(1000)
%timeit for _ in range(10):  openai_polyak(agent) 

# 100 loops, best of 3: 9.61 ms per loop
# 100 loops, best of 3: 17.4 ms per loop
# 100 loops, best of 3: 12.1 ms per loop

agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[512,512,512])).learn(1000)
%timeit  for _ in range(10): fast_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[512,512,512])).learn(1000)
%timeit for _ in range(10):  slow_polyak(agent) 

# 100 loops, best of 3: 9.55 ms per loop
# 100 loops, best of 3: 17.4 ms per loop
# 100 loops, best of 3: 11.9 ms per loop

agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32]), device='cpu').learn(1000)
%timeit  for _ in range(10): fast_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32]), device='cpu').learn(1000)
%timeit for _ in range(10):  slow_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32]), device='cpu').learn(1000)
%timeit for _ in range(10):  openai_polyak(agent) 

# 100 loops, best of 3: 2.91 ms per loop
# 100 loops, best of 3: 4.58 ms per loop
# 100 loops, best of 3: 3.82 ms per loop

This is actually quite large, at 1Million polyak updates, this shaves off 28 minutes for cpu and 2 hours 11 minutes on GPU.

0reactions
araffincommented, Jul 16, 2020

@PartiallyTyped Could you quickly try on cpu but with num_threads=1 ?

That’s the only case where I did not see an improvement yet.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Lecture 6 Optimization for Deep Neural Networks - CMSC 35246
Polyak Averaging. • On Slides but for self study: Newton and Quasi Newton. Methods (BFGS, L-BFGS, Conjugate Gradient).
Read more >
Ensemble Neural Network Model Weights in Keras (Polyak ...
This is called Polyak-Ruppert averaging and can be further improved by using a linearly or exponentially decreasing weighted average of the ...
Read more >
Optimal non-asymptotic analysis of the Ruppert–Polyak ...
Abstract. This paper is devoted to the non-asymptotic analysis of the Ruppert–Polyak averaging method introduced in Polyak and Juditsky (1992) ...
Read more >
arXiv:2106.02613v3 [stat.ML] 1 Feb 2022
may converge faster than its symmetric alternative, residual ... stance, while Polyak's averaging (Lillicrap et al., 2015), a.
Read more >
Polyak Averaging Explained | Papers With Code
Polyak Averaging is an optimization technique that sets final parameters to ... Fast Neural Architecture Search of Compact Semantic Segmentation Models via ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found