[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models
See original GitHub issueHi, this issue is related to ALBERT and especially the V2 models, specifically the xlarge
version 2.
TLDR: The ALBERT-xlarge V2 model seems to be different to other V2/V1 models.
The models are accessible through the HUB; in order to inspect them I save the checkpoints which I then load in the modeling.AlbertModel
available in the modeling.py
file. I use this script to save the checkpoint to a file.
In a different script, I load the checkpoint in a model from modeling.py
(a line has to be added to the scope so that the modeling scope begins with module
, same as the HUB module). I load the checkpoint in this script. In that same script I load a HUB module, and I compare the outputs of both models given the same input values.
For every model, I check that the difference is near to zero by checking the maximum difference between tensor values (included at the bottom of the second script). Here are the results:
ALBERT-BASE-V1 max difference: pooled 8.009374e-06, full transformer 2.3543835e-06 ALBERT-LARGE-V1 max difference: pooled 2.5719404e-05 full transformer 1.8417835e-05 ALBERT-XLARGE-V1 max difference: pooled 0.0006218478 full transformer 0.0 ALBERT-XXLARGE-V1 max difference: pooled 0.0 full transformer 1.0311604e-05
ALBERT-BASE-V2 max difference: pooled 2.3335218e-05 full transformer 4.9591064e-05 ALBERT-LARGE-V2 max difference: pooled 0.00015488267 full transformer 0.00010347366 ALBERT-XLARGE-V2 max difference: pooled 1.9535216 full transformer 5.152705 ALBERT-XXLARGE-V2 max difference: pooled 1.7762184e-05 full transformer 2.592802e-06
Is there an issue with this model in particular, does it have a particular architecture change that is different from the others? I have had no problems replicating the SQuAD results on all of the V1 models, but I could not do so on the V2 models apart for the base one. Is this related? Thank you for your time.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:7
- Comments:5
Hi @insop, to add the
module
scope, I added the following line at line 194 ofmodeling.py
:Which results in the
__init__
method ofAlbertModel
beginning with these few lines:Hi @LysandreJik
I have ran your script (
compare_albert.py
with different input_string, see below) for v2 models. My run large model shows more difference, not as large as your data for xlarge. For xlarge, difference seems okay.I have a question to run squad, but I will post in other open issue link that I saw you were there.I thought I saw you on other post, but I was mistaken. Were you able to run
run_squad_sp.py
? (in order to prevent being digressed, I could find other way to communicate in case you were able to runrun_squad_sp.py
without any issue).Thank you,