question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Date of article is not fetched properly

See original GitHub issue

Instead of getting an exact date I get ‘1 month ago’ in the results document. How can i fix that? Thank you for your help

from GoogleNews import GoogleNews
from newspaper import Article
from newspaper import Config
import pandas as pd
import nltk
#config will allow us to access the specified url for which we are #not authorized. Sometimes we may get 403 client error while parsing #the link to download the article.
nltk.download('punkt')






user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
config = Config()
config.browser_user_agent = user_agent
config.request_timeout = 5
googlenews=GoogleNews(start='10/19/2020',end='10/19/2020')
googlenews.search('test')
result=googlenews.result()
df=pd.DataFrame(result)
print(df.head())
for i in range(2,5):
    googlenews.getpage(i)
    result=googlenews.result()
    df=pd.DataFrame(result)
list=[]
for ind in df.index:
    dict={}
    article = Article(df['link'][ind],config=config)
    try:
        article.download()
        article.parse()
        article2 = article.text.split()
    except:
        print('***FAILED TO DOWNLOAD***', article.url)
        continue
    # article.download()
    # article.parse()
    article.nlp()


    dict['Date']=df['date'][ind]
    dict['Media']=df['media'][ind]
    dict['Title']=article.title
    dict['Article']=article.text
    dict['Summary']=article.summary
    list.append(dict)
news_df=pd.DataFrame(list)
news_df.to_excel("articles.xlsx")

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
HurinHucommented, Dec 5, 2020

You can do in this way.

0reactions
apavlo89commented, Dec 5, 2020

I see. Would something like this work? What do you use/suggest?

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
config = Config()
config.browser_user_agent = user_agent
config.request_timeout = 3
googlenews=GoogleNews(start='01/01/2018',end='12/29/2018')
googlenews.search('test')
result=googlenews.result()
df=pd.DataFrame(result)
print(df.head())
for i in range(2,365):
    googlenews.getpage(i)
    result=googlenews.result()
    df=pd.DataFrame(result)
    time.sleep(random.randint(1,30)) #something like this? Would this be correct?
list=[]
for ind in df.index:
    dict={}
    article = Article(df['link'][ind],config=config)
    try:
        article.download()
        article.parse()
        article2 = article.text.split()
    except:
        print('***FAILED TO DOWNLOAD***', article.url)
        continue
    # article.download()
    # article.parse()
    article.nlp()


    dict['Date']=df['date'][ind]
    dict['Media']=df['media'][ind]
    dict['Title']=article.title
    dict['Article']=article.text
    dict['Summary']=article.summary
    list.append(dict)
news_df=pd.DataFrame(list)
news_df.to_csv("articles.csv")
Read more comments on GitHub >

github_iconTop Results From Across the Web

A smart search on a date is not fetching correct results in MDM
In Master Data Management (MDM), the Elastic search is not working properly for date fields. It is getting an overmatch results.
Read more >
Casting dates properly from an API response in typescript
I use axios to fetch response from the API. const response = await axios.get<MyEvent>("/event/1"); ...
Read more >
2 Easy Methods to find Published Date of an Article
1. Find Published date of articles using Schema Markup validator ... Schema Markup validator lets you test the structure of the articles, this...
Read more >
Demystifying DateTime Manipulation in JavaScript - Toptal
In this article, I'm going to help you think clearly about date and time fields and suggest some best practices that can help...
Read more >
How to Use the Fetch API (Correctly) - CODE Magazine
In this article, you'll learn to use the Fetch API, which is a promise-based wrapper around the XMLHttpRequest object. As you'll see, the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found