Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[enhancement] Polyak Averaging could be done faster

See original GitHub issue

This is rather minor, but polyak averaging in DQN/SAC/TD3 could be done faster with far fewer intermediate tensors using torch.addcmul_ https://pytorch.org/docs/stable/torch.html#torch.addcmul.

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

3reactions

PartiallyTypedcommented, Jul 9, 2020

I tested these on sac, there is a good 1.5-1.8 speedup here. More on the GPU than the cpu because of data transfers.

def fast_polyak(agent):
  one = th.ones(1, requires_grad=False).to(agent.device)
  for param, target_param in zip(agent.critic.parameters(), agent.critic_target.parameters()):
    target_param.data.mul_(1-agent.tau)
    target_param.data.addcmul_(param.data, one, value=agent.tau)

def slow_polyak(agent):
  for param, target_param in zip(agent.critic.parameters(), agent.critic_target.parameters()):
    target_param.data.copy_((1-agent.tau)*target_param.data + agent.tau*param.data)

# how openai does it in their codebase
def openai_polyak(agent):
  for param, target_param in zip(agent.critic.parameters(), agent.critic_target.parameters()):
    target_param.data.mul_(1-agent.tau)
    target_param.data.add_(agent.tau*param.data)

agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32,32])).learn(1000)
%timeit  for _ in range(10): fast_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32,32])).learn(1000)
%timeit for _ in range(10):  slow_polyak(agent) 
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32,32])).learn(1000)
%timeit for _ in range(10):  openai_polyak(agent) 

# 100 loops, best of 3: 9.61 ms per loop
# 100 loops, best of 3: 17.4 ms per loop
# 100 loops, best of 3: 12.1 ms per loop

agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[512,512,512])).learn(1000)
%timeit  for _ in range(10): fast_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[512,512,512])).learn(1000)
%timeit for _ in range(10):  slow_polyak(agent) 

# 100 loops, best of 3: 9.55 ms per loop
# 100 loops, best of 3: 17.4 ms per loop
# 100 loops, best of 3: 11.9 ms per loop

agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32]), device='cpu').learn(1000)
%timeit  for _ in range(10): fast_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32]), device='cpu').learn(1000)
%timeit for _ in range(10):  slow_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32]), device='cpu').learn(1000)
%timeit for _ in range(10):  openai_polyak(agent) 

# 100 loops, best of 3: 2.91 ms per loop
# 100 loops, best of 3: 4.58 ms per loop
# 100 loops, best of 3: 3.82 ms per loop

This is actually quite large, at 1Million polyak updates, this shaves off 28 minutes for cpu and 2 hours 11 minutes on GPU.

0reactions

araffincommented, Jul 16, 2020

@PartiallyTyped Could you quickly try on cpu but with num_threads=1 ?

That’s the only case where I did not see an improvement yet.

Top Results From Across the Web

Lecture 6 Optimization for Deep Neural Networks - CMSC 35246

Polyak Averaging. • On Slides but for self study: Newton and Quasi Newton. Methods (BFGS, L-BFGS, Conjugate Gradient).

Ensemble Neural Network Model Weights in Keras (Polyak ...

This is called Polyak-Ruppert averaging and can be further improved by using a linearly or exponentially decreasing weighted average of the ...

Optimal non-asymptotic analysis of the Ruppert–Polyak ...

Abstract. This paper is devoted to the non-asymptotic analysis of the Ruppert–Polyak averaging method introduced in Polyak and Juditsky (1992) ...

arXiv:2106.02613v3 [stat.ML] 1 Feb 2022

may converge faster than its symmetric alternative, residual ... stance, while Polyak's averaging (Lillicrap et al., 2015), a.

Polyak Averaging Explained | Papers With Code

Polyak Averaging is an optimization technique that sets final parameters to ... Fast Neural Architecture Search of Compact Semantic Segmentation Models via ...