Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Removing valid listings during parse.infobox stage

See original GitHub issue

Example setup Information from https://en.wikivoyage.org/w/index.php?title=Barcelona&action=edit&section=5

===Visitor information===
* {{listing
| name=Tourist office at Plaça de Catalunya | alt= | url=http://www.barcelonaturisme.com/wv3/en/page/38/tourist-information-points.html | email=
| address=Plaça de Catalunya, 17-S | lat=41.3868027 | long=2.1707225 | directions=Metro: L1, L3. Bus: 9, 22, 28, 42, 47, 58, 66, 67, 68. Train: R4
| phone= | tollfree= | fax=
| hours=8:30am-8:30pm | price=
| lastedit=2015-10-22
| content=This is the main tourist office in the city.
}}

The other tourist offices can be found at Plaça de Sant Jaume, Ciutat, 2 Ajuntament de Barcelona. (City Hall.) Opening time: Monday to Friday: 8.30am-8.30pm. Saturday: 9am-7pm. Sunday and public holidays: 9am-2pm.; Estació de Sants, Plaça dels Països Catalans. How to get there: Metro: L5,L3. Bus: 63,68. Opening time:  daily, 8am-8pm. and Aeroport del Prat. Terminal 1 and 2. Opening time: Daily, 9am-9pm.  All are closed on 1st January and 25th December. For a full list of tourist information points check the link above.

The department store El Corte Ingles publishes a free street map for tourists. You can pick a copy at the store, or at one of the many hotels in the city.
Turisme de Barcelona

Converts to http://i.imgur.com/6qB3Vz2.png

All information from this {{listing}} is nowhere to be found after “infobox stage”. I guess the part where we remove templates.

Normal * lists are ok - we save them, to list array of section.

In short: We lose a lot of information about nice objects from {{}} templates - “do”, “see” etc.

Issue Analytics

State:
Created 6 years ago
Comments:13 (11 by maintainers)

Top GitHub Comments

1reaction

Amantelcommented, Nov 21, 2017

@inquire SPOILER - this is super crude, @spencermountain is on creating better solution. Still, the answer is: /src/data/i18n.js > add to i18n object array with the data (with templates) you want to parse. Like:

  ,reginfo: [
    'do',
    'see',
    'listing'
  ]

It works, of cause, but requires much more tinkering.

BTW, this is how I hunt for pagebanner and regions (regions are special for wikivoyage, so do not mind them).

  //bannerhunting code
  var re = /{{pagebanner\|(.*?)\|/;
  var foundBanners = re.exec(script);
  if (foundBanners && foundBanners.length > 1)
    r.pagebanner = foundBanners[1].replace(new RegExp(" ", 'g'), "_");



  //region gathering code
  re = /region\d*name=\[\[(.*?)\|/g;
  //https://stackoverflow.com/a/31546071/2863227    
  function matchAll(str, regex) {
    var res = [];
    var m;
    if (regex.global) {
      while (m = regex.exec(str)) {
        res.push(m[1]);
      }
    } else {
      if (m = regex.exec(str)) {
        res.push(m[1]);
      }
    }
    return res;
  }

  r.regions = matchAll(script, re) || [];

This before wiki = preProcess(wiki); in /src/parse/index.js

1reaction

Amantelcommented, Nov 4, 2017

From

if it’s not a known template, but it’s recursive, remove it

pharse I guess somehow I should add this see,do,listings and few others as know templates.

Confrimed. Just adding “listings” to i18n.js inboxes list provide us with mostly perfect result. Not ideal, but very nice. http://i.imgur.com/xRwgWZV.png

Top Results From Across the Web

How do I grab just the parsed Infobox of a wikipedia article?

I'd use the wikipedia (wikimedia) API. You can get data back in JSON, XML, php native format, and others. You'll then still need...

Extracting Wikipedia Historical Attributes Data - CiteSeerX

In this paper, we describe the collection of a large structured dataset of temporally anchored relational data, obtained from the full revision history...

dijs/infobox-parser: Parse Wikipedia Infoboxes - GitHub

The main function of this module is parsing wikipedia article's infobox data. The infobox source is in wikitext format and difficult to parse....

Extension talk:DynamicPageList (third-party) - MediaWiki

Yes, you can remove the prefix in the php source code. It is hard-coded there. You could use suppresserrors=true to suppress the error...

WHAD: Wikipedia historical attributes data - jstor

Abstract This paper describes the generation of temporally anchored infobox attribute data from the Wikipedia history of revisions.