Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problem in Brazilian sites

See original GitHub issue

I got problems using the newspaper in Brazilian sites. Following is an example:

import newspaper

info = newspaper.build('http://globoesporte.globo.com/futebol/times/sao-paulo')
len(info.artices)

It returned only 3 articles.

Sorry if I am using it wrongly.

Issue Analytics

State:
Created 10 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

codelucascommented, Jan 10, 2014

Ah, you are using the library incorrectly. But no worries, this is probably a design flaw because so many people are getting this wrong. Newspaper, by default, caches all articles which have been downloaded from any Source. Any article which has already been downloaded will never be downloaded again, unless you turn the memoize_articles setting to False.

Here is the code exactly from my computer and some examples. BTW please refer to the docs for specifics.

>>> import newspaper

>>> info = newspaper.build('http://globoesporte.globo.com/futebol/times/sao-paulo')
>>> len(info.articles) 
221 

So now lets run the above command again and see what happens:

>>> info = newspaper.build('http://globoesporte.globo.com/futebol/times/sao-paulo')
>>> len(info.articles) 
0    

Looks like they got cached.

Let's wait 20 minutes before running it again...
...

>>> info = newspaper.build('http://globoesporte.globo.com/futebol/times/sao-paulo')
>>> len(info.articles) 
12

This means that globo has published 12 new articles!
However, you can disable this setting completely.

>>> info = newspaper.build('http://globoesporte.globo.com/futebol/times/sao-paulo', memoize_articles=False)
>>> len(info.articles)
221

>>> info.size()
221

^ Please note that .size() returns the number of articles, it's easier than calling len(info.articles)!

Also, even though auto language detection is enabled, it is still better to pass in a language if you know the language of a news source, just in case newspaper messes up.

So,

info = newspaper.build('http://globoesporte.globo.com/futebol/times/sao-paulo', language='pt')

0reactions

codelucascommented, Jan 10, 2014

I just tested it on my computer, the titles all work 100% after parse().

>>> for a in info.articles:
...   a.download()
... 
>>> for a in info.articles:
...   a.parse()
... 
>>> for a in info.articles:
...   print 'title', a.title
... 
title Toquinho estreia no WSOF em março contra Jon Fitch, Burkman ou Harris - combate
title Um ano após câncer, Guilherme Leme ainda se recupera da doença
title Volkswagen divulga primeira foto oficial do Up! brasileiro; compare
title Google Glass ganha app de exercícios com realidade aumentada
title Vídeo mostra momento em que advogado foragido é preso na PB
title Briga de casal em avião obriga piloto a pousar em Salvador, diz PF
title Rivais? Marlon e William saem no tapa
title Funcionários pedem que travesti não use banheiro feminino em shopping
title Jogador faz recuo ilegal, e juiz tem que ensinar regra a todos em campo
title Veja o guia do IPVA 2014
title Estreia de Minotauro
title Companhia aérea perde mala e ator Rafael Cardoso registra no Instagram - Marie Claire
title BMW faz recall de 56 unidades da moto F 800 S no Brasil
title Advogado fugitivo é preso em pousada na orla de João Pessoa
title Revelação em 'Além do Horizonte', Marina Palha diz: 'Deixei as frescuras de lado'
title São Paulo 5 x 1 Ypiranga-SP, 09/01/1955 -
title Bernardinho garante que não será candidato ao governo do Rio
title São Paulo muda olhar sobre o Paulistão, agora meta do clube
title Vale tudo? Longe disso. Conheça as regras usadas pelo UFC - combate
title Sete suspeitos de matar ex-miss em assalto são presos na Venezuela
title Portela promete desfile com a maior escultura que já cruzou a Sapucaí
...

Top Results From Across the Web

World Report 2021: Brazil | Human Rights Watch

The Articulation of Indigenous Peoples of Brazil, an NGO, had registered 38,124 cases and 866 deaths of Indigenous people in Brazil as of...

Social issues in Brazil - Wikipedia

Most Brazilian municipalities face environmental problems, and among the main ones are fires, deforestation and silting of rivers. There are several ...

Brazil - Market Challenges - International Trade Administration

Some of the challenges that U.S. companies in Brazil may face include: ... resources to respond to legal challenges and bureaucratic issues.

3 challenges for the future of Brazil - The World Economic Forum

Zero tolerance of deforestation and a policy to fight inequality as Brazil's economic future will have "global consequences"

Poor living conditions in favelas - Brazil - Google Sites

What are the problems faced in favelas? · Poor health · Poor quality of education · Poor infrastructure · High crime rate.