Add an option to retrieve a clean/preprocess wikidata definition from entity ?
See original GitHub issueHello @kermitt2 ,
I leave this feature/proposal here. Sorry, in advance if I used a wrong terminology (preprocess/clean etc.).
- Context
Currently, the query endpoint kb/concept/
returns the concept definition with a “Wikimedia” style markup.
output example for concept “Victor Hugo” :
'''''' (; 26 February 1802 – 22 May 1885) was a French poet, novelist, and dramatist of the [[Romanticism|Romantic movement]]. Hugo is considered to be one of the greatest and best-known French writers. Outside of France, his most famous works are the novels '''', 1862, and ''[[The Hunchback of Notre-Dame]]'', 1831. In France, Hugo is known primarily for his poetry collections, such as '''' (''The Contemplations'') and '''' (''The Legend of the Ages'').
- Expect behavior
A definition without specific markup, for example (Cf. https://en.wikipedia.org/wiki/Victor_Hugo) :
Victor-Marie Hugo (26 February 1802 – 22 May 1885) was a French poet, novelist, and dramatist of the Romantic movement. Hugo is considered to be one of the greatest and best-known French writers. Outside of France, his most famous works are the novels Les Misérables, 1862, and The Hunchback of Notre-Dame, 1831. In France, Hugo is known primarily for his poetry collections, such as The Contemplations and The Legend of the Ages.
- Suggestion
I don’t know if this is complicated to implement, but it could be considered in two different ways:
-
the user has the choice to retrieve a “clean” definition by adding an optional parameter, for example, something like:
"raw":"true"
or"clean":"true"
for the kb/concept endpoint -
In the answer add a “definition_raw” key (with wikimedia markup) and a “definition_clean” key (without markup)
I think it could be useful for people who need to work on additional features, here the definition, from the entities, without going through the addition of a textual preprocessing function.
What do you think about that ?
Regards, Lucas Terriel
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top GitHub Comments
Hello @Lucaterre
Thanks for the issue.
Yes we can do this, so have plain text or the mediawiki format for the definition field which is set by a query parameter. The plain text method already exist:
https://github.com/kermitt2/entity-fishing/blob/master/src/main/java/com/scienceminer/nerd/utilities/mediaWiki/MediaWikiParser.java#L117
yes I was thinking of these two possible formats.