Fine tuning using ALBERT
See original GitHub issueI have gone through older issues and @nreimers has pointed out many times that ALBERT model does not perform quite good with sentence-transformers. I am absolutely fine with ~5-10 points less performance than BERT but after training ALBERT for 1 epoch on AllNLI dataset I got awful results.
ALBERT-large-V1 2020-06-08 18:20:28 - Cosine-Similarity : Pearson: 0.1973 Spearman: 0.2404 2020-06-08 18:20:28 - Manhattan-Distance: Pearson: 0.2318 Spearman: 0.2411 2020-06-08 18:20:28 - Euclidean-Distance: Pearson: 0.2313 Spearman: 0.2408 2020-06-08 18:20:28 - Dot-Product-Similarity: Pearson: 0.1437 Spearman: 0.1551
ALBERT-large-V2 2020-06-09 03:58:27 - Cosine-Similarity : Pearson: 0.0722 Spearman: 0.0633 2020-06-09 03:58:27 - Manhattan-Distance: Pearson: 0.1236 Spearman: 0.1089 2020-06-09 03:58:27 - Euclidean-Distance: Pearson: 0.1237 Spearman: 0.1090 2020-06-09 03:58:27 - Dot-Product-Similarity: Pearson: 0.1047 Spearman: 0.0900
I am using all default parameters mentioned in training script.
python /content/sentence-transformers/examples/training_transformers/training_nli.py 'albert-large-v1'
I checked similarity_evaluation_results
file after fine-tuning. For ALBERT-large-V2
all values for cosine_pearson
are nan
and for ALBERT-large-V1
after initial increase in value to 0.24 there is stagnation.
It takes ~8 hrs on Google colab to fine tune ALBERT on AllNLI dataset. Any pointers to get at least respectable results? I am doing anything wrong here?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (1 by maintainers)
Just FYI [2101.10642v1] Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks According to the paper, CNN based structure instead of average pooling is more good performance with ALBERT.
ALBERT-base-V2
Fine-tuned onSTSb
for4 epochs
2020-06-09 15:15:07 - Cosine-Similarity : Pearson: 0.7880 Spearman: 0.7861 2020-06-09 15:15:07 - Manhattan-Distance: Pearson: 0.7558 Spearman: 0.7592 2020-06-09 15:15:07 - Euclidean-Distance: Pearson: 0.7634 Spearman: 0.7657 2020-06-09 15:15:07 - Dot-Product-Similarity: Pearson: 0.7393 Spearman: 0.7338