Different embeddings with different length
See original GitHub issueI faced issue that while encoding same sentence but in lists of different length i receive slightly different embeddings.
Here is a code to describe what I meant:
from laserembeddings import Laser
import numpy as np
laser = Laser()
a = laser.embed_sentences(["apple", "banana", "clementina"], lang='en')
b = laser.embed_sentences(["apple"], lang='en')
c = laser.embed_sentences(["apple", "potato", "strawberry"], lang='en')
(a[0]==b[0]).all() # check if all elemnts same
#False
(a[0]==c[0]).all()
#True
np.linalg.norm(a[0]-b[0])
#1.3968409e-07
np.linalg.norm(a[0]-c[0])
#0.0
My goal is to get same embedding of word sentence apple, no matter which size of text list I use - but it seems to be unreal with current version of laserembeddings. I would like to know if such behavior is intentional or it’s a bug?
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
How to overcome training example's different lengths when ...
Let me suggest three simple options: average the vectors (component-wise), i.e., compute the word embedding vector for each word in the text ...
Read more >Cosine Similarity of Vectors of different lengths? - Stack Overflow
the right approach? I would think that cosine similarity would work with vectors of different lengths.
Read more >Learning variable-length representation of words
Variable length embedding potentially helps removing bias (over-fitting) on certain datasets. •. Proposed approach outperforms fixed-length embedding, and also ...
Read more >How to compare feature vectors with different lengths?
How to compare feature vectors with different lengths? ... Input: Feature vectors (row vectors) of varying lengths (Max. length = 30, Min. length...
Read more >different lengths for the document vectors and word vectors in ...
Doc2Vec allow use texts with different length as input of `Doc2Vec` ... word embeddings and 64-dimensional document embeddings.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Ok, but your batch is relatively small (3 elements). If you try with more sentences, you should see that the batched version is faster:
That’s a valid point. Thanks for example you provided and all of your answers.