How exactly can I use a custom model with nboost?
See original GitHub issueConnected to #45, #49 and #35 I am struggling to get nboost working with a custom model - I am not sure where to start. What exactly needs to be the input and output of a model? What function is called?
I tried to use a model trained using code from https://github.com/ThilinaRajapakse/simpletransformers#minimal-start-for-sentence-pair-classification with regression, but no luck. I wasn’t able to set the --model argument, it keeps telling me that PtBertRerankModelPlugin is not in MODULE_MAP. It loads and nboost starts, but it raises exceptions with each query:
File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/nboost/proxy.py", line 123, in proxy_through
plugin.on_response(response, db_row)
File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/nboost/plugins/rerank/base.py", line 34, in on_response
filter_results=response.request.filter_results
File "/net/scratch/people/plgklasocki/transformers-env/lib/python3.6/site-packages/nboost/plugins/rerank/base.py", line 65, in rank
score = logit[1]
IndexError: index 1 is out of bounds for axis 0 with size 1
Models from simple transformers use input as in
model.predict([[query, text]])
is that ok? Should it use .forward, or different input? What should the output be - single value between 0 and 1, a tensor (dimensions? ) Do you recommend a way to train such models (sentence-transformers, vanilla huggingface/transformers?)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:12 (7 by maintainers)
Top GitHub Comments
@klasocki the transformers tokenizer.encode function supports two arguments, and automatically adds SEP (add_special_tokens=True or something), first should be query second passage
@apohllo BertConfig can be anything so long as output size = 2 (binary classification)
No, I actually mean worse 😆 But that could just be my data. Truncating the docs to 512 tokens worked quite well, since for Wikipedia search most important information is in the beginning anyway.
Yes nboost works that way. Usual approach is that you ask elastic for e.g. 100 documents, then nboost re-ranks them (based solely on the model, not weighting with ES) and returns e.g. 10 for you