Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rerun benchmark with elasticsearch 7.5 or above

See original GitHub issue

In ES 7.5, we made some improvements to the performance of Elasticsearch dense_vector operations (https://github.com/elastic/elasticsearch/pull/46294). Although I still expect the QPS to be significantly worse than Vespa’s, it would be helpful to rerun the benchmarks against ES 7.5 to get an up-to-date comparison.

Issue Analytics

State:
Created 4 years ago
Comments:14 (9 by maintainers)

Top GitHub Comments

1reaction

jobergumcommented, Mar 23, 2020

@jtibshirani the vector is not returned with the result, if that was the case yes - I would have spotted it.

Sample response from ES

{"took":604,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":10000,"relation":"gte"},"max_score":0.005666477,"hits":[{"_index":"doc","_type":"_doc","_id":"669835","_score":0.005666477},{"_index":"doc","_type":"_doc","_id":"408764","_score":0.0056393184},{"_index":"doc","_type":"_doc","_id":"408462","_score":0.0054252045},{"_index":"doc","_type":"_doc","_id":"408855","_score":0.0053858217},{"_index":"doc","_type":"_doc","_id":"551661","_score":0.0053397696},{"_index":"doc","_type":"_doc","_id":"861882","_score":0.005264404},{"_index":"doc","_type":"_doc","_id":"406273","_score":0.0052393572},{"_index":"doc","_type":"_doc","_id":"406324","_score":0.0052266084},{"_index":"doc","_type":"_doc","_id":"551743","_score":0.005219447},{"_index":"doc","_type":"_doc","_id":"861530","_score":0.0052178036}]}}

On cpu architectures, yes it’s explained by us using avx512 instructions See

Will soon update with results using our HNSW implementation for approximate nearest neighbor search, some sample data with gist data set:

1reaction

jtibshiranicommented, Mar 23, 2020

@jobergum I’m sorry for the late reply. I’m not sure why your benchmarking results aren’t lining up with @mayya-sharipova’s. The only other difference that comes to mind is that we always make sure to omit the returning the full document source in results by setting _source: false in the search request body: https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-request-body.html#request-body-search-source-filtering. Otherwise ES will load and return the whole stored vector for the top 10 results, whereas we are just interested in the document IDs.

@jtibshirani I’ve updated the master branch using 7.6.

Thanks! The ‘Ivy Bridge’ numbers make sense to me, based on the previous results and the performance improvements in ES. However the Haswell numbers are more surprising – do you know why Vespa shows a latency improvement of ~2x between the Ivy Bridge and Haswell processors?