Bug report
See original GitHub issue- There’s a bug when using attention layer. In this line:
https://github.com/facebookresearch/ParlAI/blob/55fcf6127309f3c0e2f15c1fe6eae1fd71afcbcb/parlai/agents/seq2seq/modules.py#L80
new
hidden
states are returned, but never used for getting next prediction. This is the reason why attention model performs extremely bad. Here’s the result for just 30 mins training:
TEXT: I get to read the articles of extradition acordind to the European Court of human rights .
PREDICTION: i was just a little bit of a lot of people .
~
TEXT: Yes , you are the very monster I created
PREDICTION: i will be a good thing
~
TEXT: Hello , detective Spooner .
PREDICTION: i don' t know .
~
TEXT: I' m a tiger .
PREDICTION: i don' t know .
~
TEXT: What' ve you got ?
PREDICTION: i don' t know .
~
TEXT: We are going to change the way we see the road .
PREDICTION: i don' t know what you are .
What’s more, attention model (using local
for Twitter and general
for Opensubtitles) can really make loss lower.
-
The default value of lookuptable https://github.com/facebookresearch/ParlAI/blob/55fcf6127309f3c0e2f15c1fe6eae1fd71afcbcb/parlai/agents/seq2seq/seq2seq.py#L107 will cause much more memory usage, but I didn’t find out the reason. Old value
all
works fine. -
In this line of
vectorize()
function, https://github.com/facebookresearch/ParlAI/blob/55fcf6127309f3c0e2f15c1fe6eae1fd71afcbcb/parlai/agents/seq2seq/seq2seq.py#L403 it only returns 6 values, but newer version needs 7.
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
What Is A Bug Report? The Essential Guide + Examples Of ...
A bug report is something that stores all information needed to document, report and fix problems occurred in software or on a website....
Read more >Capture and read bug reports - Android Developers
A bug report contains device logs, stack traces, and other diagnostic information to help you find and fix bugs in your app.
Read more >How to Write A Good Bug Report? Tips and Tricks
Bug reporting is an important aspect of Software Testing. Effective Bug reports communicate well with the development team to avoid confusion or ...
Read more >14 Bug Reporting Templates You Can Copy for Your QA ...
Check out these 14 super actionable bug report templates, tailored for your issue tracker like Jira, GitHub, Trello, Asana, Excel and more.
Read more >Bug Reporting - Apple Developer
Now with Feedback Assistant available on iPhone, iPad, Mac, and the web, it's easier to submit effective bug reports and request enhancements to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
wow, thanks. that was some copypasta from the line above it but was really hurting training with attention. thanks for the catch.
unique
uses 3x more memory thanall
, intentionally.all
shares the same tensor for theweight
of the encoder Embedding layer, the decoder Embedding layer, and the final Linear layer producing an output token.unique
keeps them separate, andenc_dec
anddec_out
share the mentioned pairs.fixing, thanks.
Hi @alexholdenmiller , that’s great! I tried that implementation before, but the memory leak bug made it less fascinating to me.