(Bug) Using 'conv_like' attention causes loss to nosedive too quickly
See original GitHub issueedit: here’s all (edit) four
runs. https://wandb.ai/afiaka87/dalle_coco_train/reports/conv_like-sanity-check--Vmlldzo1MzY3MTg
‘conv_like’ seems to still have issues at the moment.
Original post:
@lucidrains I don’t think that was the fix for conv_like. I’m still seeing the behavior from before when I turn it on.
That’s a report of my current run. I’m gonna do a re-run with just full_conv again just to make sure it’s not due to the combination somehow.
Edit: okay, just doing a simple 16 depth, 8 heads, 8 batch size run with just ‘conv_like’ attention. Here’s that run: https://wandb.ai/afiaka87/dalle_coco_train/runs/36s8nsn9
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@lucidrains Feel free to check that report again. Seems your latest fix did the trick. Nicely done.
@afiaka87 thanks yet again! this really helps speed up debugging - I think I found another 🐛 , by no means could it be the last one though lol https://github.com/lucidrains/DALLE-pytorch/commit/f68cb213a3a687d7b2f0cc6c7f58148bbb89d6c4