Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some details regarding generating NQ trainset for the reader model

See original GitHub issue

Hi @AkariAsai. Thank you for this great work.

I’d like to understand more clearly how the NQ trainset for the reader model is generated. On your comment, you said that you removed all the tables and list elements from the NQ’s original preprocessed HTML data. https://github.com/AkariAsai/learning_to_retrieve_reasoning_paths/issues/9#issuecomment-610714692

I’m curious how you handled the case where a list element contains an answer and a paragraph contains the list? (like the following example) https://github.com/google-research-datasets/natural-questions/blob/master/toy_example.md

eg. <p>Google was founded in 1998 By:<ul><li>Larry</li><li>Sergey</li></ul></p>

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

mjeensungcommented, Apr 12, 2020

Thank you for the information!

1reaction

AkariAsaicommented, Apr 8, 2020

Addressing your main question, we do not filter out the lists inside paragraphs, but we remove any HTML tags remaining in the context during our post-process. Thus, the example you mentioned above would be Google was founded in 1998 By: Larry Sergey, but there might be some corner cases we missed.

In particular, we remove long answer candidate which do not start or end with paragraph tags (i.e., <P> and </P>), and thus purely table / list based items are filtered out, but we do not further filter out the table or list elements included in paragraphs.

Top Results From Across the Web

facebookresearch/DPR: Dense Passage Retriever - GitHub

A new bi-encoder model trained on NQ dataset only is now provided: a new checkpoint, training data, retrieval results and wikipedia embeddings.

Training Pipelines & Models · spaCy Usage Documentation

Training is an iterative process in which the model's predictions are compared against the reference annotations in order to estimate the gradient of...

How to Get Started Collecting Model Trains - TrainLife.com

1. HO SCALE: THE MOST POPULAR HO, or H0, is a train modeling scale using a 1: 87(3.5 mm to 1 foot) scale. It’s...

Top 50 NLP Interview Questions and Answers in 2023

We have curated a list of the top commonly asked NLP interview questions and answers that will help you ace your interviews.

Summary | Reading Quiz - Quizizz

After they began using circus trains, Barnum and Coup only brought their show to large cities. These performances were much more profitable and...