question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add an option to retrieve a clean/preprocess wikidata definition from entity ?

See original GitHub issue

Hello @kermitt2 ,

I leave this feature/proposal here. Sorry, in advance if I used a wrong terminology (preprocess/clean etc.).

  • Context

Currently, the query endpoint kb/concept/ returns the concept definition with a “Wikimedia” style markup.

output example for concept “Victor Hugo” :

'''''' (; 26 February 1802 – 22 May 1885) was a French poet, novelist, and dramatist of the [[Romanticism|Romantic movement]]. Hugo is considered to be one of the greatest and best-known French writers. Outside of France, his most famous works are the novels '''', 1862, and ''[[The Hunchback of Notre-Dame]]'', 1831. In France, Hugo is known primarily for his poetry collections, such as '''' (''The Contemplations'') and '''' (''The Legend of the Ages'').
  • Expect behavior

A definition without specific markup, for example (Cf. https://en.wikipedia.org/wiki/Victor_Hugo) :

Victor-Marie Hugo (26 February 1802 – 22 May 1885) was a French poet, novelist, and dramatist of the Romantic movement. Hugo is considered to be one of the greatest and best-known French writers. Outside of France, his most famous works are the novels Les Misérables, 1862, and The Hunchback of Notre-Dame, 1831. In France, Hugo is known primarily for his poetry collections, such as The Contemplations and The Legend of the Ages.
  • Suggestion

I don’t know if this is complicated to implement, but it could be considered in two different ways:

  1. the user has the choice to retrieve a “clean” definition by adding an optional parameter, for example, something like: "raw":"true" or "clean":"true" for the kb/concept endpoint

  2. In the answer add a “definition_raw” key (with wikimedia markup) and a “definition_clean” key (without markup)

I think it could be useful for people who need to work on additional features, here the definition, from the entities, without going through the addition of a textual preprocessing function.

What do you think about that ?

Regards, Lucas Terriel

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
kermitt2commented, Jul 7, 2022

Hello @Lucaterre

Thanks for the issue.

Yes we can do this, so have plain text or the mediawiki format for the definition field which is set by a query parameter. The plain text method already exist:

https://github.com/kermitt2/entity-fishing/blob/master/src/main/java/com/scienceminer/nerd/utilities/mediaWiki/MediaWikiParser.java#L117

0reactions
kermitt2commented, Jul 8, 2022

Just curious, what other “cross-mediawiki” formats do you think of in the future? HTML, Markdown for example?

yes I was thinking of these two possible formats.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wikidata:Data access
Use the Linked Data Interface when you need to obtain individual, complete entities that are already known to you.
Read more >
Help:Statements - Wikidata
Statements are used for recording data about an item; Statements consist of (at least) one property-value pair; Statements can be further ...
Read more >
Help:Properties - Wikidata
This page in a nutshell: Properties describe the data value of a statement; Properties, like items, have their own unique pages on Wikidata...
Read more >
Wikidata:Glossary
Entity is the content of a Wikidata page, such as an item (in the main namespace), property (in the Property namespace) or lexeme...
Read more >
Help:Items - Wikidata
Wikidata is the free knowledge base that anyone can edit. Just like Wikipedia, Wikidata is built on the MediaWiki wiki package which means...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found