Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Different output vectors for same sentences

See original GitHub issue

Hi, I am using ELMo for Japanese. Here is my code:

from elmoformanylangs import Embedder
e = Embedder('/Users/tanh/Desktop/alt/JapaneseElmo')

if __name__ == '__main__':
    sents = [
        ['今'],
        ['今'],
        ['潮水', '退']
    ]
    print(e.sents2elmo(sents))
    print(e.sents2elmo(sents))

And here is the console output: `2018-11-14 10:33:26,441 INFO: 1 batches, avg len: 3.3 [array([[-0.23187001, -0.09699917, 0.46900252, …, -0.33114347, 0.18502058, -0.27423012]], dtype=float32), array([[-0.23187001, -0.09699917, 0.46900252, …, -0.33114347, 0.18502058, -0.27423012]], dtype=float32), array([[-0.11759937, -0.04552874, 0.22546595, …, 0.21812831, -0.33964303, -0.33022305], [-0.26380852, -0.27671477, -0.33576807, …, 0.15142155, -0.04612424, -0.74970037]], dtype=float32)]

2018-11-14 10:33:26,734 INFO: 1 batches, avg len: 3.3 [array([[-0.25601366, -0.10413959, 0.45184097, …, -0.34171066, 0.18976462, -0.2817447 ]], dtype=float32), array([[-0.25601366, -0.10413959, 0.45184097, …, -0.34171066, 0.18976462, -0.2817447 ]], dtype=float32), array([[-0.12085894, -0.05347676, 0.18303208, …, 0.22256255, -0.37257898, -0.39672664], [-0.21205096, -0.31738985, -0.34304047, …, 0.24654591, -0.07900852, -0.710617 ]], dtype=float32)] ` So as you can see, the output is different when I run sents2elmo twice, is this normal or a bug? If it’s normal so how can I prevent it from happening again?

Issue Analytics

State:
Created 5 years ago
Reactions:7
Comments:9

Top GitHub Comments

4reactions

tkon3commented, Nov 15, 2018

Hello, I have this behavior aswell. I guess its something related to LSTM internal states as stated in AllenNLP note (Notes on statefulness and non-determinism) : https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md

Do we need to specify special tokens at the begining/end of each sentence ?

3reactions

blouargantcommented, Nov 15, 2018

Hello, I have exactly the same behavior with the French model and I was going to open an issue with the very same questions 😃

After something like 10 loops over the same sentence, word vectors start to stabilize. It looks like the model continue to train even after calling .eval() function.

Note that the output of the word encoder ( output_layer=0) always give the same results. Only the outputs of the LSTMs are changing.

Top Results From Across the Web

What is Sentence Similarity? - Hugging Face

Sentence similarity models convert input texts into vectors (embeddings) that capture semantic information and calculate how close (similar) they are between ...

Different techniques to represent words as vectors (Word ...

In this article, we'll explore Count Vectorizer, TF-IDF Vectorizer, Hashing Vectorizer and Word2Vec.

Top 4 Sentence Embedding Techniques using Python!

Sentence embedding techniques represent entire sentences and their semantic information, etc as vectors. Let us have a look at the top ones.

Why Doc2vec gives 2 different vectors for the same texts

Doc2Vec doesn't give meaningful results on tiny, toy-sized examples. The vectors only acquire good relative meanings when they're the result of ...

Is adding the embedded words of a sentence to represent the ...

So the question asks how to represent a series of words a uniform vector representation, which is not dependent on sequence.