Problem in Brazilian sites
See original GitHub issueI got problems using the newspaper in Brazilian sites. Following is an example:
import newspaper
info = newspaper.build('http://globoesporte.globo.com/futebol/times/sao-paulo')
len(info.artices)
It returned only 3 articles.
Sorry if I am using it wrongly.
Issue Analytics
- State:
- Created 10 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
World Report 2021: Brazil | Human Rights Watch
The Articulation of Indigenous Peoples of Brazil, an NGO, had registered 38,124 cases and 866 deaths of Indigenous people in Brazil as of...
Read more >Social issues in Brazil - Wikipedia
Most Brazilian municipalities face environmental problems, and among the main ones are fires, deforestation and silting of rivers. There are several ...
Read more >Brazil - Market Challenges - International Trade Administration
Some of the challenges that U.S. companies in Brazil may face include: ... resources to respond to legal challenges and bureaucratic issues.
Read more >3 challenges for the future of Brazil - The World Economic Forum
Zero tolerance of deforestation and a policy to fight inequality as Brazil's economic future will have "global consequences"
Read more >Poor living conditions in favelas - Brazil - Google Sites
What are the problems faced in favelas? · Poor health · Poor quality of education · Poor infrastructure · High crime rate.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ah, you are using the library incorrectly. But no worries, this is probably a design flaw because so many people are getting this wrong. Newspaper, by default, caches all articles which have been downloaded from any Source. Any article which has already been downloaded will never be downloaded again, unless you turn the
memoize_articles
setting toFalse
.Here is the code exactly from my computer and some examples. BTW please refer to the docs for specifics.
Also, even though auto language detection is enabled, it is still better to pass in a language if you know the language of a news source, just in case newspaper messes up.
So,
info = newspaper.build('http://globoesporte.globo.com/futebol/times/sao-paulo', language='pt')
I just tested it on my computer, the titles all work 100% after
parse()
.