Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

About efficiency of the model

See original GitHub issue

Hi,

Thanks for the great repo, I enjoy a lot exploring it! However, when I tried to run the code in the “Use BLINK in your codebase” chapter in README, I found the speed of running the model relatively slow (in fast=False mode). To be more specific, when I execute “main_dense.run”, the first stage of processing proceeded relatively slow (~2.5 seconds per item) while the later stage (printing “Evaluation”) proceeded ~ 5 items per second. Also, I tried adding indices as below.

config = {
    ...
    "faiss_index": "flat",
    "index_path": models_path+"faiss_flat_index.pkl"
}

However, the performance of the first stage became even worse (~20 seconds per item). I’m wondering if I’m setting something wrong (especially for the faiss index) which resulted in the low speed. If there are any corrections/methods to speed up? Thanks for your help! (I’ll post the performance logs below if needed!)

Issue Analytics

State:
Created 2 years ago
Comments:21 (1 by maintainers)

Top GitHub Comments

4reactions

AOZMHcommented, May 6, 2021

Roughly, we can know that T(cross+bi) : T(bi) = 3:1, ignoring the other process

Thanks for the theoretical analysis! That’s a bit different from the case I encountered, in which the bi-encoder phase was quite slow and adding an index turned to be even slower.
Actually, when I read through the code, I found that the bi-encoder phase was completely running on CPU (including a transformer encoding of the sentence and a matrix multiplication to calculate the similarity scores of each entity), resulting in the slow execution compared with the cross-encoder which totally ran on GPU.
I also manually changed code to set the bi-encoder phase to run on GPU, which resulted in a GPU out-of-memory due to the LARGE multiplication (hidden_dim * num_entities). As far as I can tell, the execution requires at least 24GB GPU memory for fp32, which exceeds the limit of my 16GB P100.
To solve that, I tried to turn all calculations to fp16, which did not induce OOM, but resulted in a warning telling me that large matmul of fp16 had bugs (see this). Actually, the bi-encoder results of fp16 was also totally wrong, e.g. giving a few incorrect entities high scores.
Finally, I manually pruned the entities to fit in 16GB memory at fp32 and everything was fine, the relative time of bi-encoder and cross-encoder turned to be ~20ms and ~100ms, which is reasonable for me.

To wrap up, I conclude that:

I guess it was the high memory cost that led the contributors to turn bi-encoder phase to CPU (which requires RAM instead of GPU memory), but that significantly harms performance, as we all know, transformers on CPU are weigh more slow than on GPU.
To smoothly run BLINK fully on one GPU, I think we need a >24GB GPU memory, which I guess is not a scarcity for FAIR, but somehow poses difficulty for students like me 😃
What I still haven’t solved is the slow execution of faiss-index, which was even slower than the pure-cpu bi-encoder execution. Maybe someone can give some additional comments?

Thanks for all the help! I’ll be happy to follow any updates.

0reactions

abhinavkulkarnicommented, Oct 27, 2022

Hi,

You may want to make some changes to the codebase and add support for more sparse indexes. Currently, BLINK codebase only supports flat indices.

I am currently using a sparse index OPQ32_768,IVF4096,PQ32x8 built on candidate encodings and the speed improvement is significant.

For e.g., this is what my faiss_indexer.py looks like.

This is how I load the models.

config = {
    "interactive": False,
    "fast": False,
    "top_k": 8,
    "biencoder_model": models_path + "biencoder_wiki_large.bin",
    "biencoder_config": models_path + "biencoder_wiki_large.json",
    "crossencoder_model": models_path + "crossencoder_wiki_large.bin",
    "crossencoder_config": models_path + "crossencoder_wiki_large.json",
    "entity_catalogue": models_path + "entities_aliases_with_ids.jsonl",
    "entity_encoding": models_path + "all_entities_aliases.t7",
    "faiss_index": "OPQ32_768,IVF4096,PQ32x8",
    "index_path": models_path + "index_opq32_768_ivf4096_pq32x8.faiss",
    "output_path": "logs/",  # logging directory
}

self.args = argparse.Namespace(**config)

logger.info("Loading BLINK model...")
self.models = main_dense.load_models(self.args, logger=logger)

Top Results From Across the Web

Efficient Model - an overview | ScienceDirect Topics

In several cases, efficient models can be built if some of the parts of the structures or the systems are not completely modelled,...

model of efficiency/virtue etc - Longman Dictionary

model of efficiency/virtue etc meaning, definition, what is model of efficiency/virtue etc: someone or something that has a lot of a...: Learn more....

A new innovative method for model efficiency performance

Rather than depending on such expressions without visual impressions, this paper presents an effective model efficiency evaluation methodology ...

Editorial: Model Selection and Efficiency - jstor

Using a more efficient test, test of different size criterion, Bayes information criterion or another information criterion doe problem, because each of them ......

An Examination of the Care Planning Process in Nursing Homes

Modeling Efficiency at the Process Level: An Examination of the Care ... Making nursing homes more efficient merits closer attention as a strategy...