question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Different output vectors for same sentences

See original GitHub issue

Hi, I am using ELMo for Japanese. Here is my code:

from elmoformanylangs import Embedder
e = Embedder('/Users/tanh/Desktop/alt/JapaneseElmo')

if __name__ == '__main__':
    sents = [
        ['今'],
        ['今'],
        ['潮水', '退']
    ]
    print(e.sents2elmo(sents))
    print(e.sents2elmo(sents))

And here is the console output: `2018-11-14 10:33:26,441 INFO: 1 batches, avg len: 3.3 [array([[-0.23187001, -0.09699917, 0.46900252, …, -0.33114347, 0.18502058, -0.27423012]], dtype=float32), array([[-0.23187001, -0.09699917, 0.46900252, …, -0.33114347, 0.18502058, -0.27423012]], dtype=float32), array([[-0.11759937, -0.04552874, 0.22546595, …, 0.21812831, -0.33964303, -0.33022305], [-0.26380852, -0.27671477, -0.33576807, …, 0.15142155, -0.04612424, -0.74970037]], dtype=float32)]

2018-11-14 10:33:26,734 INFO: 1 batches, avg len: 3.3 [array([[-0.25601366, -0.10413959, 0.45184097, …, -0.34171066, 0.18976462, -0.2817447 ]], dtype=float32), array([[-0.25601366, -0.10413959, 0.45184097, …, -0.34171066, 0.18976462, -0.2817447 ]], dtype=float32), array([[-0.12085894, -0.05347676, 0.18303208, …, 0.22256255, -0.37257898, -0.39672664], [-0.21205096, -0.31738985, -0.34304047, …, 0.24654591, -0.07900852, -0.710617 ]], dtype=float32)] ` So as you can see, the output is different when I run sents2elmo twice, is this normal or a bug? If it’s normal so how can I prevent it from happening again?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:7
  • Comments:9

github_iconTop GitHub Comments

4reactions
tkon3commented, Nov 15, 2018

Hello, I have this behavior aswell. I guess its something related to LSTM internal states as stated in AllenNLP note (Notes on statefulness and non-determinism) : https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md

Do we need to specify special tokens at the begining/end of each sentence ?

3reactions
blouargantcommented, Nov 15, 2018

Hello, I have exactly the same behavior with the French model and I was going to open an issue with the very same questions 😃

After something like 10 loops over the same sentence, word vectors start to stabilize. It looks like the model continue to train even after calling .eval() function.

Note that the output of the word encoder ( output_layer=0) always give the same results. Only the outputs of the LSTMs are changing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is Sentence Similarity? - Hugging Face
Sentence similarity models convert input texts into vectors (embeddings) that capture semantic information and calculate how close (similar) they are between ...
Read more >
Different techniques to represent words as vectors (Word ...
In this article, we'll explore Count Vectorizer, TF-IDF Vectorizer, Hashing Vectorizer and Word2Vec.
Read more >
Top 4 Sentence Embedding Techniques using Python!
Sentence embedding techniques represent entire sentences and their semantic information, etc as vectors. Let us have a look at the top ones.
Read more >
Why Doc2vec gives 2 different vectors for the same texts
Doc2Vec doesn't give meaningful results on tiny, toy-sized examples. The vectors only acquire good relative meanings when they're the result of ...
Read more >
Is adding the embedded words of a sentence to represent the ...
So the question asks how to represent a series of words a uniform vector representation, which is not dependent on sequence.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found