question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some details regarding generating NQ trainset for the reader model

See original GitHub issue

Hi @AkariAsai. Thank you for this great work.

I’d like to understand more clearly how the NQ trainset for the reader model is generated. On your comment, you said that you removed all the tables and list elements from the NQ’s original preprocessed HTML data. https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/issues/9#issuecomment-610714692

I’m curious how you handled the case where a list element contains an answer and a paragraph contains the list? (like the following example) https://github.com/google-research-datasets/natural-questions/blob/master/toy_example.md

eg. <p>Google was founded in 1998 By:<ul><li>Larry</li><li>Sergey</li></ul></p>

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mjeensungcommented, Apr 12, 2020

Thank you for the information!

1reaction
AkariAsaicommented, Apr 8, 2020

Addressing your main question, we do not filter out the lists inside paragraphs, but we remove any HTML tags remaining in the context during our post-process. Thus, the example you mentioned above would be Google was founded in 1998 By: Larry Sergey, but there might be some corner cases we missed.

In particular, we remove long answer candidate which do not start or end with paragraph tags (i.e., <P> and </P>), and thus purely table / list based items are filtered out, but we do not further filter out the table or list elements included in paragraphs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

facebookresearch/DPR: Dense Passage Retriever - GitHub
A new bi-encoder model trained on NQ dataset only is now provided: a new checkpoint, training data, retrieval results and wikipedia embeddings.
Read more >
Training Pipelines & Models · spaCy Usage Documentation
Training is an iterative process in which the model's predictions are compared against the reference annotations in order to estimate the gradient of...
Read more >
How to Get Started Collecting Model Trains - TrainLife.com
1. HO SCALE: THE MOST POPULAR HO, or H0, is a train modeling scale using a 1: 87(3.5 mm to 1 foot) scale. It’s...
Read more >
Top 50 NLP Interview Questions and Answers in 2023
We have curated a list of the top commonly asked NLP interview questions and answers that will help you ace your interviews.
Read more >
Summary | Reading Quiz - Quizizz
After they began using circus trains, Barnum and Coup only brought their show to large cities. These performances were much more profitable and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found