Building custom model over the final embedding layerSee original GitHub issue
BERT supposedly generates 768 dimensional embeddings for tokens. I am trying to build a multi-class classification model on top of this. My assumption is that the output of layer
Encoder-12-FeedForward-Norm of shape
(None, [seq_length], 768) would give this embeddings. This is what I am trying :
model = load_trained_model_from_checkpoint(config_path, checkpoint_path, training=True, seq_len=seq_len) new_out = Bidirectional(LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(model.layers[-9].output) new_out = GlobalMaxPool1D()(new_out) new_out = Dense(50, activation='relu')(new_out) new_out = Dropout(0.1)(new_out) new_out = Dense(6, activation='sigmoid')(new_out) newModel = Model(model.inputs[:2], new_out)
I get the following error for
new_out = GlobalMaxPool1D()(new_out) :
TypeError: Layer global_max_pooling1d_11 does not support masking, but was passed an input_mask: Tensor("Encoder-12-FeedForward-Add/All:0", shape=(?, 128), dtype=bool)
I am not sure how masking is involved if I am just using the output of the encoder.
The paper mentions that the output corresponding to just the first
[CLS] token should be used for classification. On trying this :
new_out = Lambda(lambda x: x[:,0,:])(model.layers[-9].output)
the model trains (although with poor results).
How can the pre-loaded model be used for classification?
- Created 5 years ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
I forgot to return a
None mask in
MaskedGlobalMaxPool1D. I’ve fixed it and made a release.
#7 Sentence Embedding
GlobalMaxPool1D doesn’t support masking. Following is a modification that suits this case:
I’ve added a demo for sentence embedding with pooling: