Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How exactly can I use a custom model with nboost?

See original GitHub issue

Connected to #45, #49 and #35 I am struggling to get nboost working with a custom model - I am not sure where to start. What exactly needs to be the input and output of a model? What function is called?

I tried to use a model trained using code from https://github.com/ThilinaRajapakse/simpletransformers#minimal-start-for-sentence-pair-classification with regression, but no luck. I wasn’t able to set the --model argument, it keeps telling me that PtBertRerankModelPlugin is not in MODULE_MAP. It loads and nboost starts, but it raises exceptions with each query:

  File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/nboost/proxy.py", line 123, in proxy_through
    plugin.on_response(response, db_row)
  File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/nboost/plugins/rerank/base.py", line 34, in on_response
    filter_results=response.request.filter_results
  File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/nboost/plugins/rerank/base.py", line 65, in rank
    score = logit[1]
IndexError: index 1 is out of bounds for axis 0 with size 1

Models from simple transformers use input as in model.predict([[query, text]]) is that ok? Should it use .forward, or different input? What should the output be - single value between 0 and 1, a tensor (dimensions? ) Do you recommend a way to train such models (sentence-transformers, vanilla huggingface/transformers?)

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:12 (7 by maintainers)

Top GitHub Comments

1reaction

pertschukcommented, Jun 1, 2020

@klasocki the transformers tokenizer.encode function supports two arguments, and automatically adds SEP (add_special_tokens=True or something), first should be query second passage

@apohllo BertConfig can be anything so long as output size = 2 (binary classification)

0reactions

klasockicommented, Jul 15, 2020

No, I actually mean worse 😆 But that could just be my data. Truncating the docs to 512 tokens worked quite well, since for Wikipedia search most important information is in the beginning anyway.

Yes nboost works that way. Usual approach is that you ask elastic for e.g. 100 documents, then nboost re-ranks them (based solely on the model, not weighting with ES) and returns e.g. 10 for you