hyperperameters setting of MAML and test code
See original GitHub issueFor the MAML reproducing, the document says that Note that the original MAML paper trains with 5 fast adaptation step, but tests with 10 steps. This implementation only provides the training code.
So does that mean the test codes in the file maml_miniimagenet.py and maml_omniglot.py are not correct? I found that it uses the same function (fast_adapt
) with training code.
Also, could you please provide the specific hyperperameters setting for the reproducing table of MAML for MiniImageNet and Omniglot? I use the default setting and did not change the test code, the results are not good. It says Only the fast-adaptation learning rate needs a bit of tuning, and good values usually lie in a 0.5-2x range of the original value. So we only need to fine tune thefast_lr
in the range [0.5, 2] and use the test code in the current maml_miniimagenet.py and maml_omniglot.py? The original value means the fast_lr=0.5
in the code or the original value in the original fast adaptation lr in the original MAML paper?
I am sorry for these naive questions, I have tried many times and did not get the results in the table, which has token too much much time.
Thank you so much!
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (7 by maintainers)
Top GitHub Comments
What I meant by this is that if you modify the values of the hyper-parameters in different places in the code, you can make it work as you want.
For evaluation, instead of using the
meta_batch_size
as the number of tasks you want to evaluate you can just change it for the value that you want e.g 1024.Same for the function
fast_adapt
. It works just fine for meta-testing as well but you need to make some changes to the arguments. For example, the argumentadaptation_steps
can be changed from 5 to 10 when calling the function during evaluating. Also the step size (or learning rate) can be easily changed during meta-testing by writingmaml.lr = 0.5
after training and before evaluating.For 5-way 5-shot I actually managed to get 61.6%, 63.7% and 64.8% across 3 different seeds (actual seed values were 1, 2 and 3) with the exact same hyper-parameters! I run the experiments for 10.000 or 20.000 iterations. I am not sure why you are getting worse results…
I run all the experiments on a GTX 1080 Ti, however I run multiple experiments in parallel in the same GPU so there is some overhead. If I were to run one experiment at a time I would expect slightly better times than the ones below. This is also why the times might not be very consistent.
By the way I am not a developer of this project, just a big fan and user 😃
Thanks for starting an interesting conversation @Hugo101 and for great answers @Kostis-S-Z.
I think there are two scenarios we’re discussing.
Is there an additional case I am missing ? If you need help with the task transform, I’d be happy to provide some information. (Also, making custom task transforms would be a very useful tutorial.)