Test QAPipeline with GPU

@fmikaelian I have implemented the changes as discussed. I tested the fit() method for the retriever and the predict() method in a notebook included in examples. Everything is working fine.

Could you please test the fit() method for the reader in GPU to check if everything is ok?

Please note that the current implementation of QAPipeline still uses the TfidfRetriever as you have implemented (passing the dataframe column as input to['content'])). It should be changed once you will have implemented the improvement I proposed in #95.

fmikaeliancommented, May 3, 2019

Using a pre-trained reader model on GPU:

qa_pipe = QAPipeline(metadata=df, model='../models/bert_qa_squad_v1.1_sklearn/bert_qa_squad_v1.1_sklearn.joblib')
qa_pipe.model.output_dir = '../logs/bert_qa_squad_v1.1_sklearn'

X = 'Since when does the Excellence Program of BNP Paribas exist?'
prediction = qa_pipe.predict(X)

print('question: {}'.format(X))
print('answer: {}'.format(prediction))
question: Since when does the Excellence Program of BNP Paribas exist?
answer: January 2016
andrelmfariascommented, May 14, 2019

I tested with the following code on Colab:

processor = BertProcessor(bert_model='bert-base-uncased', do_lower_case=True, is_training=True)
train_examples, train_features = processor.fit_transform('./data/dev-v1.1.json')

reader = BertQA(train_batch_size=8, num_train_epochs=1, output_dir='test', fp16=True), train_features))

And got the following error:

NameError                                 Traceback (most recent call last)
<ipython-input-23-ab02e6a6f1de> in <module>()
----> 1, train_features))

<ipython-input-21-c31ed17ed5ea> in fit(self, X, y)
   1144, output_model_file)
   1145         model_to_save.config.to_json_file(output_config_file)
-> 1146         tokenizer.save_vocabulary(self.output_dir)

NameError: name 'tokenizer' is not defined

The problem occurs in

It probably comes from the direct adaptation of from Hugging Face to our BertQA class:

For now, I will delete the line in the fit() method of BertQA. If needed, we might include this saving in the BertProcessor class (when the attribute is_training is True), where we process the text and create a vocabulary.

