Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possible to iterate through every single page of Wikipedia?

See original GitHub issue

I’m trying to use allpages to gather some statistics about all the articles in Wikipedia.

I’m not sure what the error is exactly is but it appears that maybe the API call is limited?

Traceback (most recent call last):
  File "tablecount.py", line 7, in <module>
    for page in site.allpages(filterredir='nonredirects'):
  File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 71, in next
    return self.__next__(*args, **kwargs)
  File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 170, in __next__
    info = super(GeneratorList, self).__next__()
  File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 54, in __next__
    self.load_chunk()
  File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 181, in load_chunk
    return super(GeneratorList, self).load_chunk()
  File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 96, in load_chunk
    self.set_iter(data)
  File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 111, in set_iter
    if self.result_member not in data['query']:
KeyError: 'query'

What do you think could be the problem?

Issue Analytics

State:
Created 5 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

danmichaelocommented, Apr 6, 2018

If the response timed out or returned something non-JSON, I think it would have failed in a different place. Could be server issues causing a wrong response. Hard to tell.

In any case, if you want to keep the script running for a very long time, you should probably add retry on failure. Mwclient retries automatically for some common cases, but not all, as you experienced. Something like this: (from top of my head, not tested!)

pages = site.allpages(filterredir='nonredirects')
while True:
    try:
        page = next(pages)
    except StopIteration:
        break
    except:
        print('Encountered error. Will retry in 30 secs')
        time.sleep(30)

0reactions

k----ncommented, Apr 6, 2018

Yeah, I actually reran the code and there seems to be no error again (so far).

It might’ve been a network connection error or something where the response wasn’t received in time.

I’m not sure how this error might be reproduced. Any ideas?