Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Find the closest 5 sentences of the corpus for each query sentence based on cosine similarity

See original GitHub issue

Hi there,

I want to exploit semantic search through cosine similarity and to do so, I have prepared the following datasets:

Queries: <class ‘list’> 179435 Corpus embeddings: <class ‘numpy.ndarray’> (31257735, 128) Corpus: <class ‘list’> 31257735

Although I could run the same code on Google Colab (different embedding size: 768), pytorch_cos_sim stuck and threw the following error on the server:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-9f8d1c8ab6d4> in <module>
      5 
      6     # We use cosine-similarity and torch.topk to find the highest 5 scores
----> 7     cos_scores = util.pytorch_cos_sim(query_embedding, corpus_embeddings)[0]
      8     top_results = torch.topk(cos_scores, k=top_k)
      9 

~/anaconda3/envs/method2/lib/python3.8/site-packages/sentence_transformers/util.py in pytorch_cos_sim(a, b)
     19     :return: Matrix with res[i][j]  = cos_sim(a[i], b[j])
     20     """
---> 21     return cos_sim(a, b)
     22 
     23 def cos_sim(a: Tensor, b: Tensor):

~/anaconda3/envs/method2/lib/python3.8/site-packages/sentence_transformers/util.py in cos_sim(a, b)
     40     a_norm = torch.nn.functional.normalize(a, p=2, dim=1)
     41     b_norm = torch.nn.functional.normalize(b, p=2, dim=1)
---> 42     return torch.mm(a_norm, b_norm.transpose(0, 1))
     43 
     44 

RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

I was wondering if you could elaborate more on how to debug the error, please?

Let me just add that due to the lack of memory, I employed PCA for dimensionality reduction.

Regards, Javad