question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Serious inconsistencies in Wizard of Wikipedia data

See original GitHub issue

If we look at the validation data, in the original data there are 8840 wizard turns all unique (including both the validation sets). Out of the 3058 val instances only 1069 “answers” (or wizard utterances) are unique. That is, the same wizard utterance is shared by multiple conversational histories.

I also checked the latest version, and the issue still seems to be there. If you search for "answer": "Brown hair is the second in the file - http://dl.fbaipublicfiles.com/KILT/wow-dev-kilt.jsonl, you’ll find the answer being used in 38 different conversations. That’s not the case in the original data.

The same can be said about the train set. Out of 94577 training instances, there are only 63733 unique inputs and 20427 unique answers. I didn’t check the current online version of the file though, because I’m assuming the same bug affects both validation and training files.

Here are the most 10 frequent validation answers (with frequencies)

[("I think I'm going to have mine done by a professional hairdresser,", 45),
 ('Jazz originated in the late 19th century', 44),
 ('Red is the colour at the end of the visible spectrum of light', 38),
 ('Brown hair is the second most common human hair color, after black hair, my hair is also brown but on the lighter side. some parts of my hair change to blonde in the summer.', 38),
 ('I am not 100% sure on that however, I do know that it was founded by Enzo Ferrari and the company built its first car in 1940.', 37),
 ("I probably wouldn't. I'm happy with black hair. Although, hair coloring is definitely on the rise, as 75% of women and 18% of men in Copenhagen, for example, have reported dying their hair, if that gives you any indication.", 37),
 ('Not 100% sure on that, either but Brand Finance rated the car the worlds most powerful car in 2014. That is awesome and I think I need a Ferrari. lol', 29),
 ('It has a wavelenght that starts at 625 nanometres.', 29),
 ('Hello, have you colored your hair before? It is practice of changing the hair color', 28),
 ("I've herd something crazy like 75% of women and 18% of men use hair dye.", 28)]

10 most frequent training answers (with frequencies)

[('It originated from Italy.', 343),
 ('I have one dog! I love selectively bred dogs.', 259),
 ('The first mention of it was in the 10th century, but nobody knows for sure who invented it.', 237),
 ("It's different. Our pizza was invented in Naples, and that's been popular around the world.", 210),
 ('Not right now, but I wish I did they are great for companionship and they can hunt vermin...lol', 190),
 ('So do I! it is one of the three primary colours??', 182),
 ('Yep, blue mixed with green and violet to make turquoise is great as well', 180),
 ("Yes, I see where you're coming from, but theres also potential for dogs proficient in hunting and herding, pulling loads,", 173),
 ('I think veganism is a bit narcissistic. The philosphy behind it I think elevates animals status illogically.', 171),
 ('yea it was founded by richard and maurice mcdonald in san bernardino, california', 166)]

I haven’t checked any other datasets, so can’t speak for them.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
fabiopetronicommented, May 17, 2021

Thanks a lot for your help in this @AshwinParanjape! Very happy to hear that the dev data is now ok. 😃 The train data indeed contained some duplicated <input,answer> pairs. Could you please try now?

Note that we filtered out from the validation set all instances for which we were unable to map the knowledge evidence to the KILT Wikipedia dump.

2reactions
fabiopetronicommented, May 14, 2021

There was a little bug in the mapping script for Wizard of Wikipedia that affected a portion of the answers. The good news is that the provenance section is not affected. Also, all the other datasets in KILT are not affected by this bug since we use different and dedicated mapping scripts. @AshwinParanjape thanks again a lot for reporting this inconsistency - we really appreciate it! I fixed the bug and prepared a new version of the data at:

Could you please double check that inconsistencies are actually gone? If so, I’ll update the official files as well as the results for this dataset.

Read more comments on GitHub >

github_iconTop Results From Across the Web

KNOWLEDGE-POWERED CONVERSATIONAL AGENTS
In this work we consider learning dialogue models to replace the wizard in our learning tasks, i.e. the knowledgeable speaker. The dialogue model...
Read more >
Wizard of Wikipedia: Knowledge-Powered Conversational ...
Use of knowledge has so far proved difficult, in part because of the lack of a supervised learning benchmark task which exhibits knowledgeable ......
Read more >
Wizard of Wikipedia: Knowledge-Powered ... - DeepAI
Overall data statistics can be found in Table 1, and examples of collected conversations in Appendix A.2. Wizard of Wikipedia Task, Train, Valid ......
Read more >
Data corruption - Wikipedia
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended ...
Read more >
Wizard of Wikipedia Dataset | Papers With Code
Wizard of Wikipedia is a large dataset with conversations directly grounded with knowledge retrieved from Wikipedia. It is used to train and evaluate ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found