Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why DifferentiableOptimizer detaches parameters when track_higher_grads = False?

See original GitHub issue

Hi! Thank you for this awesome library, it helps me a lot.

I am not sure whether I’m missing something, but I’m confused about why DifferentiableOptimizer detaches parameters when track_higher_grads = False:

https://github.com/facebookresearch/higher/blob/1e20cf9696054277b2d760f64835d5d74a3115a2/higher/optim.py#L251-L257

which cuts the gradient path back to the original model parameters, even though copy_initial_weights=False. When we set copy_initial_weights=False, we want to allow gradients flow back to the original model parameters, but line 257 cut off the gradient flow.

In my use case, I want to implement something like FOMAML and here is a simplify version of my code：

def inner_loop(self, fmodel, diffopt, train_input, train_target):
    # ...

def outer_loop(self, task_batch):
    self.out_optim.zero_grad()

    for task_data in task_batch:
        support_input, support_target, query_input, query_target = task_data

        with higher.innerloop_ctx(
            self.model, self.in_optim, copy_initial_weights=False, track_higher_grads=False
        ) as (fmodel, diffopt):
            self.inner_loop(fmodel, diffopt, support_input, support_target)

            query_output = fmodel(query_input)
            query_loss = F.cross_entropy(query_output, query_target)
            query_loss.backward()

    for param in self.model.parameters():
        print(param.grad)  # output: None
    self.out_optim.step()

The gradients were not propagated back to the original parameters. My code works well after I edit the code of higher to:

new_params = params[:]
for group, mapping in zip(self.param_groups, self._group_to_param_list):
    for p, index in zip(group['params'], mapping):
        new_params[index] = p

I know this problem can be solved by manully mapping the gradients, but I just wonder why detaching parameters is necessary here. And thank you for your nice work again!

Issue Analytics

State:
Created 3 years ago
Reactions:3
Comments:7

Top GitHub Comments

3reactions

eric-mitchellcommented, Jun 16, 2021

As a workaround, I think you can use diff_opt.step(loss, grad_callback=lambda grads: [g.detach() for g in grads]). This gives the same outer loop gradient as when using torch.autograd.grad to compute gradients with track_higher_grads=False, but .backward() still works. As a bonus, you also get first-order gradients for inner loop learning rates (if you’re learning those). With track_higher_grads=False, you don’t get gradients for learning rates.

0reactions

brando90commented, Nov 1, 2022

solution is easy, they are doing detach on params p not on gradients g which is totally of course!

Top Results From Across the Web

Trainer is setting parameters with requires_grad=False to ...

Bug When training a model that has some parameters where requires_grad=False, the Trainer is actually setting requires_grad=True for these ...

python - Do all variables in the loss function have to be tensor ...

All of the variables you want to optimise via optimizer.step() need to have gradient. In your case it would be y predicted by...

Parameters with requires_grad = False are updated during ...

I'am trying to freeze front layers during training. Before starting optimization, the optimizer is constructed by optimizer = torch.optim.SGD( ...

Differentiable Optimizers — higher 0.2.1 documentation

Setting this to False allows the differentiable optimizer to be used in “test mode”, without potentially tracking higher order gradients.

Chapter 5: Differentiable optimization - Deep Implicit Layers

The OptLayer layer takes five parameters, variables , which consist of a set of cvxpy variables that are optimized over, corresponding to the...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Why DifferentiableOptimizer detaches parameters when track_higher_grads = False?

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Is there data leakage in the maml-omniglot example?

Why not accumulate loss and then take derivative in MAML?