Possible to iterate through every single page of Wikipedia?
See original GitHub issueI’m trying to use allpages to gather some statistics about all the articles in Wikipedia.
I’m not sure what the error is exactly is but it appears that maybe the API call is limited?
Traceback (most recent call last):
File "tablecount.py", line 7, in <module>
for page in site.allpages(filterredir='nonredirects'):
File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 71, in next
return self.__next__(*args, **kwargs)
File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 170, in __next__
info = super(GeneratorList, self).__next__()
File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 54, in __next__
self.load_chunk()
File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 181, in load_chunk
return super(GeneratorList, self).load_chunk()
File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 96, in load_chunk
self.set_iter(data)
File "k----n/wiki/local/lib/python2.7/site-packages/mwclient/listing.
py", line 111, in set_iter
if self.result_member not in data['query']:
KeyError: 'query'
What do you think could be the problem?
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Crawling all wikipedia pages for phrases in python
I have downloaded all the articles onto my hard drive, but I'm not sure how I can tell the program to iterate through...
Read more >Wikipedia:Getting to Philosophy
There have been some theories on this phenomenon, with the most prevalent being the tendency for Wikipedia pages to move up a "classification...
Read more >Scraping from all over Wikipedia - Towards Data Science
Last week I wrote about how to scrape data from a table on Wikipedia ... You could go through every single town page...
Read more >How to Scrape Wikipedia Articles with Python - freeCodeCamp
The scraper will go to a Wikipedia page, scrape the title, ... I will use beautiful soup to find all the <a> tags...
Read more >Web scraping from Wikipedia using Python - A Complete Guide
The goal is to scrape data from the Wikipedia Home page and parse it ... As all the tags are nested, we can...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
If the response timed out or returned something non-JSON, I think it would have failed in a different place. Could be server issues causing a wrong response. Hard to tell.
In any case, if you want to keep the script running for a very long time, you should probably add retry on failure. Mwclient retries automatically for some common cases, but not all, as you experienced. Something like this: (from top of my head, not tested!)
Yeah, I actually reran the code and there seems to be no error again (so far).
It might’ve been a network connection error or something where the response wasn’t received in time.
I’m not sure how this error might be reproduced. Any ideas?