Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[ALBERT] albert-xlarge V2 seems to have a different behavior than the other models

See original GitHub issue

Hi, this issue is related to ALBERT and especially the V2 models, specifically the xlarge version 2.

TLDR: The ALBERT-xlarge V2 model seems to be different to other V2/V1 models.

The models are accessible through the HUB; in order to inspect them I save the checkpoints which I then load in the modeling.AlbertModel available in the modeling.py file. I use this script to save the checkpoint to a file.

In a different script, I load the checkpoint in a model from modeling.py (a line has to be added to the scope so that the modeling scope begins with module, same as the HUB module). I load the checkpoint in this script. In that same script I load a HUB module, and I compare the outputs of both models given the same input values.

For every model, I check that the difference is near to zero by checking the maximum difference between tensor values (included at the bottom of the second script). Here are the results:

ALBERT-BASE-V1 max difference: pooled 8.009374e-06, full transformer 2.3543835e-06 ALBERT-LARGE-V1 max difference: pooled 2.5719404e-05 full transformer 1.8417835e-05 ALBERT-XLARGE-V1 max difference: pooled 0.0006218478 full transformer 0.0 ALBERT-XXLARGE-V1 max difference: pooled 0.0 full transformer 1.0311604e-05

ALBERT-BASE-V2 max difference: pooled 2.3335218e-05 full transformer 4.9591064e-05 ALBERT-LARGE-V2 max difference: pooled 0.00015488267 full transformer 0.00010347366 ALBERT-XLARGE-V2 max difference: pooled 1.9535216 full transformer 5.152705 ALBERT-XXLARGE-V2 max difference: pooled 1.7762184e-05 full transformer 2.592802e-06

Is there an issue with this model in particular, does it have a particular architecture change that is different from the others? I have had no problems replicating the SQuAD results on all of the V1 models, but I could not do so on the V2 models apart for the base one. Is this related? Thank you for your time.

Issue Analytics

State:
Created 4 years ago
Reactions:7
Comments:5

Top GitHub Comments

1reaction

LysandreJikcommented, Nov 25, 2019

Hi @insop, to add the module scope, I added the following line at line 194 of modeling.py:

with tf.variable_scope("module"):

Which results in the __init__ method of AlbertModel beginning with these few lines:

[...]
    config = copy.deepcopy(config)
    if not is_training:
      config.hidden_dropout_prob = 0.0
      config.attention_probs_dropout_prob = 0.0

    input_shape = get_shape_list(input_ids, expected_rank=2)
    batch_size = input_shape[0]
    seq_length = input_shape[1]

    if input_mask is None:
      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)

    if token_type_ids is None:
      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)

    with tf.variable_scope("module"):
      with tf.variable_scope(scope, default_name="bert"):
        with tf.variable_scope("embeddings"):
          # Perform embedding lookup on the word ids.
          (self.word_embedding_output,
[...]

0reactions

insopcommented, Nov 27, 2019

Hi @LysandreJik

I have ran your script (compare_albert.py with different input_string, see below) for v2 models. My run large model shows more difference, not as large as your data for xlarge. For xlarge, difference seems okay.

~~I have a question to run squad, but I will post in other open issue link that I saw you were there.~~

I thought I saw you on other post, but I was mistaken. Were you able to run run_squad_sp.py? (in order to prevent being digressed, I could find other way to communicate in case you were able to run run_squad_sp.py without any issue).

Thank you,


$ python -c 'import tensorflow as tf; print(tf.__version__)'
1.15.0

// one change I've made is this
# Create inputs
#input_sentence = "this is nice".lower()
input_sentence = "The most difficult thing is the decision to act, the rest is merely tenacity. The fears are paper tigers. You can do anything you decide to do. You can act to change and control your life; and the procedure, the process is its own reward.".lower()


model: base

Comparing the HUB and TF1 layers
-- pooled            1.5154481e-05
-- full transformer  3.1471252e-05


model: large

Comparing the HUB and TF1 layers
-- pooled            0.014360733
-- full transformer  0.014184952


model: xlarge

Comparing the HUB and TF1 layers
-- pooled            1.6540289e-06
-- full transformer  4.9889088e-05

model: xxlarge

Comparing the HUB and TF1 layers
-- pooled            2.5779009e-05
-- full transformer  1.8566847e-05

Top Results From Across the Web

[ALBERT] albert-xlarge V2 seems to have a different ... - GitHub

TLDR: The ALBERT-xlarge V2 model seems to be different to other V2/V1 models. The models are accessible through the HUB; in order to...

albert-xxlarge-v2 - Hugging Face

ALBERT XXLarge v2. Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper ...

Key Point Matching with Transformers - ACL Anthology

The analysis included a number of different transformers, namely BERT,. XLNet, RoBERTa and ALBERT, as well as sen- tence embedding transformers, ...

How to Query Language Models? | DeepAI

This suggests that LMs contain more factual and commonsense knowledge than previously assumed–if we query the model in the right way.

paddlenlp.transformers.albert.modeling 源代码 - Read the Docs

"""Modeling classes for ALBERT model.""" import math from typing import Optional, Tuple, List from dataclasses import dataclass import paddle import ...