question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why DifferentiableOptimizer detaches parameters when track_higher_grads = False?

See original GitHub issue

Hi! Thank you for this awesome library, it helps me a lot.

I am not sure whether I’m missing something, but I’m confused about why DifferentiableOptimizer detaches parameters when track_higher_grads = False:

https://github.com/facebookresearch/higher/blob/1e20cf9696054277b2d760f64835d5d74a3115a2/higher/optim.py#L251-L257

which cuts the gradient path back to the original model parameters, even though copy_initial_weights=False. When we set copy_initial_weights=False, we want to allow gradients flow back to the original model parameters, but line 257 cut off the gradient flow.

In my use case, I want to implement something like FOMAML and here is a simplify version of my code:

def inner_loop(self, fmodel, diffopt, train_input, train_target):
    # ...

def outer_loop(self, task_batch):
    self.out_optim.zero_grad()

    for task_data in task_batch:
        support_input, support_target, query_input, query_target = task_data

        with higher.innerloop_ctx(
            self.model, self.in_optim, copy_initial_weights=False, track_higher_grads=False
        ) as (fmodel, diffopt):
            self.inner_loop(fmodel, diffopt, support_input, support_target)

            query_output = fmodel(query_input)
            query_loss = F.cross_entropy(query_output, query_target)
            query_loss.backward()

    for param in self.model.parameters():
        print(param.grad)  # output: None
    self.out_optim.step()

The gradients were not propagated back to the original parameters. My code works well after I edit the code of higher to:

new_params = params[:]
for group, mapping in zip(self.param_groups, self._group_to_param_list):
    for p, index in zip(group['params'], mapping):
        new_params[index] = p

I know this problem can be solved by manully mapping the gradients, but I just wonder why detaching parameters is necessary here. And thank you for your nice work again!

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:3
  • Comments:7

github_iconTop GitHub Comments

3reactions
eric-mitchellcommented, Jun 16, 2021

As a workaround, I think you can use diff_opt.step(loss, grad_callback=lambda grads: [g.detach() for g in grads]). This gives the same outer loop gradient as when using torch.autograd.grad to compute gradients with track_higher_grads=False, but .backward() still works. As a bonus, you also get first-order gradients for inner loop learning rates (if you’re learning those). With track_higher_grads=False, you don’t get gradients for learning rates.

0reactions
brando90commented, Nov 1, 2022

solution is easy, they are doing detach on params p not on gradients g which is totally of course!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Trainer is setting parameters with requires_grad=False to ...
Bug When training a model that has some parameters where requires_grad=False, the Trainer is actually setting requires_grad=True for these ...
Read more >
python - Do all variables in the loss function have to be tensor ...
All of the variables you want to optimise via optimizer.step() need to have gradient. In your case it would be y predicted by...
Read more >
Parameters with requires_grad = False are updated during ...
I'am trying to freeze front layers during training. Before starting optimization, the optimizer is constructed by optimizer = torch.optim.SGD( ...
Read more >
Differentiable Optimizers — higher 0.2.1 documentation
Setting this to False allows the differentiable optimizer to be used in “test mode”, without potentially tracking higher order gradients.
Read more >
Chapter 5: Differentiable optimization - Deep Implicit Layers
The OptLayer layer takes five parameters, variables , which consist of a set of cvxpy variables that are optimized over, corresponding to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found