question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Test QAPipeline with GPU

See original GitHub issue

@fmikaelian I have implemented the changes as discussed. I tested the fit() method for the retriever and the predict() method in a notebook included in examples. Everything is working fine.

Could you please test the fit() method for the reader in GPU to check if everything is ok?

Please note that the current implementation of QAPipeline still uses the TfidfRetriever as you have implemented (passing the dataframe column as input to TfidfRetriever.fit(df['content'])). It should be changed once you will have implemented the improvement I proposed in #95.

_Originally posted by @andrelmfarias in https://github.com/fmikaelian/cdQA/pull/101#issuecomment-488628180_

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:28

github_iconTop GitHub Comments

2reactions
fmikaeliancommented, May 3, 2019

Using a pre-trained reader model on GPU:

qa_pipe = QAPipeline(metadata=df, model='../models/bert_qa_squad_v1.1_sklearn/bert_qa_squad_v1.1_sklearn.joblib')
qa_pipe.fit()
qa_pipe.model.output_dir = '../logs/bert_qa_squad_v1.1_sklearn'

X = 'Since when does the Excellence Program of BNP Paribas exist?'
prediction = qa_pipe.predict(X)

print('question: {}'.format(X))
print('answer: {}'.format(prediction))
question: Since when does the Excellence Program of BNP Paribas exist?
answer: January 2016
1reaction
andrelmfariascommented, May 14, 2019

I tested BertQA.fit() with the following code on Colab:

processor = BertProcessor(bert_model='bert-base-uncased', do_lower_case=True, is_training=True)
train_examples, train_features = processor.fit_transform('./data/dev-v1.1.json')

reader = BertQA(train_batch_size=8, num_train_epochs=1, output_dir='test', fp16=True)
reader.fit(X=(train_examples, train_features))

And got the following error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-ab02e6a6f1de> in <module>()
----> 1 reader.fit(X=(train_examples, train_features))

<ipython-input-21-c31ed17ed5ea> in fit(self, X, y)
   1144         torch.save(model_to_save.state_dict(), output_model_file)
   1145         model_to_save.config.to_json_file(output_config_file)
-> 1146         tokenizer.save_vocabulary(self.output_dir)
   1147 
   1148         self.model.to(self.device)

NameError: name 'tokenizer' is not defined

The problem occurs in https://github.com/fmikaelian/cdQA/blob/0dce89f48ab53a69e8fdb8b76f39029f465f5bbc/cdqa/reader/bertqa_sklearn.py#L1165

It probably comes from the direct adaptation of run_squad.py from Hugging Face to our BertQA class:

https://github.com/huggingface/pytorch-pretrained-BERT/blob/3fc63f126ddf883ba9659f13ec046c3639db7b7e/examples/run_squad.py#L1036

For now, I will delete the line in the fit() method of BertQA. If needed, we might include this saving in the BertProcessor class (when the attribute is_training is True), where we process the text and create a vocabulary.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pipelines - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >
Tutorial: How to Use Pipelines - Haystack - Deepset
You can double check whether the GPU runtime is enabled with the ... import Pipeline # Custom built extractive QA pipeline p_extractive =...
Read more >
Test Drive GPU-Accelerated Servers - NVIDIA
Accelerate your most demanding analytics, high-performance computing (HPC), inference, and training workloads with a free test drive of NVIDIA data center ...
Read more >
Question Answering through BERT in 10 steps - Numpy Ninja
2) CDQA also has QAPipeline whereinto the documents will be fitted ... Remember if you have lots of data do not forget to...
Read more >
Long Form Question Answering in Haystack - Pinecone
An open-book abstractive QA pipeline looks like this: ... First, we check that we are using the GPU, as this will make the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found