question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Removing valid listings during parse.infobox stage

See original GitHub issue

Example setup Information from https://en.wikivoyage.org/w/index.php?title=Barcelona&action=edit&section=5

===Visitor information===
* {{listing
| name=Tourist office at Plaça de Catalunya | alt= | url=http://www.barcelonaturisme.com/wv3/en/page/38/tourist-information-points.html | email=
| address=Plaça de Catalunya, 17-S | lat=41.3868027 | long=2.1707225 | directions=Metro: L1, L3. Bus: 9, 22, 28, 42, 47, 58, 66, 67, 68. Train: R4
| phone= | tollfree= | fax=
| hours=8:30am-8:30pm | price=
| lastedit=2015-10-22
| content=This is the main tourist office in the city.
}}

The other tourist offices can be found at Plaça de Sant Jaume, Ciutat, 2 Ajuntament de Barcelona. (City Hall.) Opening time: Monday to Friday: 8.30am-8.30pm. Saturday: 9am-7pm. Sunday and public holidays: 9am-2pm.; Estació de Sants, Plaça dels Països Catalans. How to get there: Metro: L5,L3. Bus: 63,68. Opening time:  daily, 8am-8pm. and Aeroport del Prat. Terminal 1 and 2. Opening time: Daily, 9am-9pm.  All are closed on 1st January and 25th December. For a full list of tourist information points check the link above.

The department store El Corte Ingles publishes a free street map for tourists. You can pick a copy at the store, or at one of the many hotels in the city.
Turisme de Barcelona

Converts to http://i.imgur.com/6qB3Vz2.png

All information from this {{listing}} is nowhere to be found after “infobox stage”. I guess the part where we remove templates.

Normal * lists are ok - we save them, to list array of section.

In short: We lose a lot of information about nice objects from {{}} templates - “do”, “see” etc.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
Amantelcommented, Nov 21, 2017

@inquire SPOILER - this is super crude, @spencermountain is on creating better solution. Still, the answer is: /src/data/i18n.js > add to i18n object array with the data (with templates) you want to parse. Like:

  ,reginfo: [
    'do',
    'see',
    'listing'
  ]

It works, of cause, but requires much more tinkering.

BTW, this is how I hunt for pagebanner and regions (regions are special for wikivoyage, so do not mind them).

  //bannerhunting code
  var re = /{{pagebanner\|(.*?)\|/;
  var foundBanners = re.exec(script);
  if (foundBanners && foundBanners.length > 1)
    r.pagebanner = foundBanners[1].replace(new RegExp(" ", 'g'), "_");



  //region gathering code
  re = /region\d*name=\[\[(.*?)\|/g;
  //https://stackoverflow.com/a/31546071/2863227    
  function matchAll(str, regex) {
    var res = [];
    var m;
    if (regex.global) {
      while (m = regex.exec(str)) {
        res.push(m[1]);
      }
    } else {
      if (m = regex.exec(str)) {
        res.push(m[1]);
      }
    }
    return res;
  }

  r.regions = matchAll(script, re) || [];

This before wiki = preProcess(wiki); in /src/parse/index.js

1reaction
Amantelcommented, Nov 4, 2017

From

if it’s not a known template, but it’s recursive, remove it

pharse I guess somehow I should add this see,do,listings and few others as know templates.

Confrimed. Just adding “listings” to i18n.js inboxes list provide us with mostly perfect result. Not ideal, but very nice. http://i.imgur.com/xRwgWZV.png

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I grab just the parsed Infobox of a wikipedia article?
I'd use the wikipedia (wikimedia) API. You can get data back in JSON, XML, php native format, and others. You'll then still need...
Read more >
Extracting Wikipedia Historical Attributes Data - CiteSeerX
In this paper, we describe the collection of a large structured dataset of temporally anchored relational data, obtained from the full revision history...
Read more >
dijs/infobox-parser: Parse Wikipedia Infoboxes - GitHub
The main function of this module is parsing wikipedia article's infobox data. The infobox source is in wikitext format and difficult to parse....
Read more >
Extension talk:DynamicPageList (third-party) - MediaWiki
Yes, you can remove the prefix in the php source code. It is hard-coded there. You could use suppresserrors=true to suppress the error...
Read more >
WHAD: Wikipedia historical attributes data - jstor
Abstract This paper describes the generation of temporally anchored infobox attribute data from the Wikipedia history of revisions.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found