Reproduce the results on CoLA
See original GitHub issueI try to reproduce the CoLA results reported in the BERT paper but the numbers are far from the reported one. My best mcc (BERT large) for dev is 64.79% and the test result is 56.9% while the reported test result is 60.5%. The learning rate is 2e-5 and the total number of epochs is 5. For BERT base,the result is also lower by 3-5%.
As the paper said,
for BERTLARGE we found that fine-tuning was sometimes unstable on small data sets (i.e., some runs would produce degenerate results), so we ran several random restarts and selected the model that performed best on the Dev set.
I also tried several restarts with different learning rates and random seeds but it seems no improvement. I’m quite confused for the reproduction. Any suggestions would be greatly appreciated.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (2 by maintainers)

Top Related StackOverflow Question
Cola is probably one of the most unstable tasks for BERT. For us it mostly boiled down to running many seeds. If all you care about is a good pre-trained model checkpoint, we have a 65 / 61 run at https://github.com/zphang/bert_on_stilts
Hi @cooelf, what parameter number did you change in order to fit a better result, thanks!