txtai Similarity really slow with ElasticSearch
See original GitHub issueI’ve noticed when running ElasticSearch and txtai.pipeline for Similarity, the search (ranksearch) is very slow. When trying to search for 1 item, it can take upto 10 seconds.
The code I’m using is:
from txtai.pipeline import Similarity
from elasticsearch import Elasticsearch, helpers
# Connect to ES instance
es = Elasticsearch(hosts=["http://localhost:9200"], timeout=60, retry_on_timeout=True)
def ranksearch(query, limit):
results = [text for _, text in search(query, limit * 10)]
return [(score, results[x]) for x, score in similarity(query, results)][:limit]
def search(query, limit):
query = {
"size": limit,
"query": {
"query_string": {"query": query}
}
}
results = []
for result in es.search(index="articles", body=query)["hits"]["hits"]:
source = result["_source"]
results.append((min(result["_score"], 18) / 18, source["title"]))
return results
similarity = Similarity("valhalla/distilbart-mnli-12-3")
limit = 1
query = "Bad News"
print(ranksearch(query, limit))
Issue Analytics
- State:
- Created a year ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
python 3.x - txtai ElasticSearch Similarity slow - Stack Overflow
In using txtai, I've noticed that it is abysmally slow. Requesting for one result and my response time is almost 10 seconds vs...
Read more >Add semantic search to Elasticsearch - neuml/txtai - GitHub
txtai has a similarity function that works on lists of text. This method can be integrated with any external search service, such as...
Read more >Similarity - txtai
Computes the similarity between query and list of text. Returns a list of (id, score) sorted by highest score, where id is the...
Read more >Slow cosine similarity script - Elasticsearch - Elastic Discuss
Hi, in a query, I am executing a cosine similarity script. It takes multiple seconds, but the top command shows CPU and memory...
Read more >Introducing txtai, AI-powered semantic search built ... - Medium
txtai builds sentence embeddings to perform similarity searches. txtai takes each text record entry, tokenizes it and builds an embeddings representation of ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
No problem, glad I could help.
For reference, a laptop I have is 5+ years old, has a quad core CPU and a 8 GB GPU with 1920 CUDA cores. Modest specs compared to the most recent hardware. I’ve run benchmarks on this hardware below just to give you an idea of what you should expect.
GPU prices have really come down lately. A RTX 3060 is ~$500 and there are RTX 3090s out there for around $1,100. A year ago those were 2.5-3x more expensive.
A RTX 3060 has 3,584 CUDA cores with 12GB of memory and a RTX 3090 has 10,496 with 24GB of memory. The elapsed time per call would be much lower on either of those. Server class NVIDIA GPUs are typically Quadro, V100, A100.
That does make quite the difference. Setting the limit to ten yields results around 4.3 seconds.
I will look into setting up something with respect to GPU processing. I greatly appreciate your help.