Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Test QAPipeline with GPU

See original GitHub issue

@fmikaelian I have implemented the changes as discussed. I tested the fit() method for the retriever and the predict() method in a notebook included in examples. Everything is working fine.

Could you please test the fit() method for the reader in GPU to check if everything is ok?

Please note that the current implementation of QAPipeline still uses the TfidfRetriever as you have implemented (passing the dataframe column as input to TfidfRetriever.fit(df['content'])). It should be changed once you will have implemented the improvement I proposed in #95.

_Originally posted by @andrelmfarias in https://github.com/fmikaelian/cdQA/pull/101#issuecomment-488628180_

Issue Analytics

State:
Created 4 years ago
Reactions:3
Comments:28

Top GitHub Comments

2reactions

fmikaeliancommented, May 3, 2019

Using a pre-trained reader model on GPU:

qa_pipe = QAPipeline(metadata=df, model='../models/bert_qa_squad_v1.1_sklearn/bert_qa_squad_v1.1_sklearn.joblib')
qa_pipe.fit()
qa_pipe.model.output_dir = '../logs/bert_qa_squad_v1.1_sklearn'

X = 'Since when does the Excellence Program of BNP Paribas exist?'
prediction = qa_pipe.predict(X)

print('question: {}'.format(X))
print('answer: {}'.format(prediction))

question: Since when does the Excellence Program of BNP Paribas exist?
answer: January 2016

1reaction

andrelmfariascommented, May 14, 2019

I tested BertQA.fit() with the following code on Colab:

processor = BertProcessor(bert_model='bert-base-uncased', do_lower_case=True, is_training=True)
train_examples, train_features = processor.fit_transform('./data/dev-v1.1.json')

reader = BertQA(train_batch_size=8, num_train_epochs=1, output_dir='test', fp16=True)
reader.fit(X=(train_examples, train_features))

And got the following error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-ab02e6a6f1de> in <module>()
----> 1 reader.fit(X=(train_examples, train_features))

<ipython-input-21-c31ed17ed5ea> in fit(self, X, y)
   1144         torch.save(model_to_save.state_dict(), output_model_file)
   1145         model_to_save.config.to_json_file(output_config_file)
-> 1146         tokenizer.save_vocabulary(self.output_dir)
   1147 
   1148         self.model.to(self.device)

NameError: name 'tokenizer' is not defined

The problem occurs in https://github.com/fmikaelian/cdQA/blob/0dce89f48ab53a69e8fdb8b76f39029f465f5bbc/cdqa/reader/bertqa_sklearn.py#L1165

It probably comes from the direct adaptation of run_squad.py from Hugging Face to our BertQA class:

https://github.com/huggingface/pytorch-pretrained-BERT/blob/3fc63f126ddf883ba9659f13ec046c3639db7b7e/examples/run_squad.py#L1036

For now, I will delete the line in the fit() method of BertQA. If needed, we might include this saving in the BertProcessor class (when the attribute is_training is True), where we process the text and create a vocabulary.

Top Results From Across the Web

Pipelines - Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Tutorial: How to Use Pipelines - Haystack - Deepset

You can double check whether the GPU runtime is enabled with the ... import Pipeline # Custom built extractive QA pipeline p_extractive =...

Test Drive GPU-Accelerated Servers - NVIDIA

Accelerate your most demanding analytics, high-performance computing (HPC), inference, and training workloads with a free test drive of NVIDIA data center ...

Question Answering through BERT in 10 steps - Numpy Ninja

2) CDQA also has QAPipeline whereinto the documents will be fitted ... Remember if you have lots of data do not forget to...

Long Form Question Answering in Haystack - Pinecone

An open-book abstractive QA pipeline looks like this: ... First, we check that we are using the GPU, as this will make the...