Using ELMO instead of BERT
See original GitHub issueHi; Thank you fot your great and well-explained work. Do you have how can i use ELMO instead of BERT ?
Code :
handler = TextHandler(sentences=preprocessed_documents)
handler.prepare() # create vocabulary and training data
docELMO = [i.split() for i in unpreprocessed_documents]
from elmoformanylangs import Embedder
e = Embedder('/home/nassera/136',batch_size = 64)
training_elmo = e.sents2elmo(docELMO, output_layer=0)
print("training ELMO : ", training_elmo[0])
training_dataset = CTMDataset(handler.bow, training_elmo, handler.idx2token)
ctm = CombinedTM(input_size=len(handler.vocab), bert_input_size=768, n_components=50)
ctm.fit(training_dataset) # run the model
print('topics : ',ctm.get_topics())
When i run this code i get this error :
2021-01-16 22:12:51,392 INFO: char embedding size: 3773 2021-01-16 22:12:52,371 INFO: word embedding size: 221272 2021-01-16 22:12:58,469 INFO: Model( (token_embedder): ConvTokenEmbedder( (word_emb_layer): EmbeddingLayer( (embedding): Embedding(221272, 100, padding_idx=3) ) (char_emb_layer): EmbeddingLayer( (embedding): Embedding(3773, 50, padding_idx=3770) ) (convolutions): ModuleList( (0): Conv1d(50, 32, kernel_size=(1,), stride=(1,)) (1): Conv1d(50, 32, kernel_size=(2,), stride=(1,)) (2): Conv1d(50, 64, kernel_size=(3,), stride=(1,)) (3): Conv1d(50, 128, kernel_size=(4,), stride=(1,)) (4): Conv1d(50, 256, kernel_size=(5,), stride=(1,)) (5): Conv1d(50, 512, kernel_size=(6,), stride=(1,)) (6): Conv1d(50, 1024, kernel_size=(7,), stride=(1,)) ) (highways): Highway( (_layers): ModuleList( (0): Linear(in_features=2048, out_features=4096, bias=True) (1): Linear(in_features=2048, out_features=4096, bias=True) ) ) (projection): Linear(in_features=2148, out_features=512, bias=True) ) (encoder): ElmobiLm( (forward_layer_0): LstmCellWithProjection( (input_linearity): Linear(in_features=512, out_features=16384, bias=False) (state_linearity): Linear(in_features=512, out_features=16384, bias=True) (state_projection): Linear(in_features=4096, out_features=512, bias=False) ) (backward_layer_0): LstmCellWithProjection( (input_linearity): Linear(in_features=512, out_features=16384, bias=False) (state_linearity): Linear(in_features=512, out_features=16384, bias=True) (state_projection): Linear(in_features=4096, out_features=512, bias=False) ) (forward_layer_1): LstmCellWithProjection( (input_linearity): Linear(in_features=512, out_features=16384, bias=False) (state_linearity): Linear(in_features=512, out_features=16384, bias=True) (state_projection): Linear(in_features=4096, out_features=512, bias=False) ) (backward_layer_1): LstmCellWithProjection( (input_linearity): Linear(in_features=512, out_features=16384, bias=False) (state_linearity): Linear(in_features=512, out_features=16384, bias=True) (state_projection): Linear(in_features=4096, out_features=512, bias=False) ) ) ) 2021-01-16 22:13:11,365 INFO: 2 batches, avg len: 20.9 training ELMO : [[ 0.06318592 -0.04212857 -0.40941882 ... -0.393932 0.65597 -0.19988859] [ 0.0464317 -0.03159406 -0.23152797 ... 0.2573734 0.28932744 -0.21369117] [ 0.04215719 -0.27414545 -0.1282109 ... -0.01528776 0.15322109 -0.02998078] ... [-0.20043871 0.11804245 -0.5754699 ... 0.19337586 -0.06868231 0.11217812] [-0.1898424 -0.24078836 -0.1522124 ... -0.08325598 -0.5789431 -0.21831807] [ 0.08684797 -0.14746179 -0.2742679 ... 0.06612014 0.15257567 -0.32261848]] Settings: N Components: 50 Topic Prior Mean: 0.0 Topic Prior Variance: 0.98 Model Type: prodLDA Hidden Sizes: (100, 100) Activation: softplus Dropout: 0.2 Learn Priors: True Learning Rate: 0.002 Momentum: 0.99 Reduce On Plateau: False Save Dir: None Traceback (most recent call last): File "/home/nassera/PycharmProjects/MyProject/TM_FB/Test_CTM_ELMO.py", line 76, in <module> ctm.fit(training_dataset) # run the model File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/contextualized_topic_models/models/ctm.py", line 227, in fit sp, train_loss = self._train_epoch(train_loader) File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/contextualized_topic_models/models/ctm.py", line 154, in _train_epoch for batch_samples in loader: File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 363, in __next__ data = self._next_data() File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data return self._process_data(data) File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data data.reraise() File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate return {key: default_collate([d[key] for d in batch]) for key in elem} File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 74, in <dictcomp> return {key: default_collate([d[key] for d in batch]) for key in elem} File "/home/nassera/PycharmProjects/MyProject/venv/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [30, 1024] at entry 0 and [21, 1024] at entry 1
i got a list of numpy arrays concerning ELMO but it bug in
ctm.fit(training_dataset) # run the model
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (5 by maintainers)

Top Related StackOverflow Question
You could do mean pooling (averaging over the sequence), but I am not sure if this can be good for the embeddings you get from ELMo.
I also saw your open issue. I’ll search a bit more.
Is there any reason why you prefer ELMo to BERT?
Thank you so much, i don’t prefer ELMo, but i would like to compare the 2 contextualized word embedding.