question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Article `download()` failed with 404 Client Error

See original GitHub issue

Hi,

I keep getting this error message - Article download() failed with 404 Client Error: Not Found for url: http://www.foxnews.com/2017/09/22/sheriff-clarke-trump-wins-either-way-luther-strange-roy-moore-alabama-senate-race on URL http://www.foxnews.com/2017/09/22/sheriff-clarke-trump-wins-either-way-luther-strange-roy-moore-alabama-senate-race

It happens for various article url links.

Here is the code i am using, `news_content = newspaper.build(url) for eachArticle in news_content.articles: i = i +1 article = news_content.articles[i]

    article.download()#now download and parse each articles
    article.parse()

    article.nlp()


    backupfile.write("\n"+ "--------------------------------------------------------------" + "\n")
    backupfile.write(str(article.keywords))


    datasetfile.write("\n" + "----SUMMARY ARTICLE-> No. " + str(i) + "\n")
    datasetfile.write(article.summary) #only summary of the article is written in the dataset directory


    backupfile.write("\n"+"----SUMMARY ARTICLE---" + "\n")
    backupfile.write(article.summary)
    backupfile.write("\n"+"----TEXT INSIDE ARTICLE---" + "\n")
    backupfile.write(article.text)
    time.sleep(2)`

Attached below is the screenshot of the error, screenshot from 2017-09-23 14-46-29

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:14 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
monajalalcommented, Jul 23, 2020

I posted the solution here:

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()

config.browser_user_agent = user_agent
url = "https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830".strip()



page = Article(url, config=config)


page.download()
page.parse()
print(page.text)

Here is the link: https://stackoverflow.com/a/63060794/2414957

2reactions
moosemachineDKcommented, Aug 30, 2018

I just used a simple try except structure. Seems to works just fine (at least for the 404 error I was seeing)(code below - don’t mind the splitting and stuff’ 😃)


    try:
        article.download()
        article.parse()
        article2 = article.text.split()
    except:
        print('***FAILED TO DOWNLOAD***', article.url)
        continue

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to fix Newspaper3k 403 Client Error for certain URL's?
When using article.parse, I end up getting an error: newspaper.article.ArticleException: Article download() failed with 403 Client Error: ...
Read more >
Error 404: 4 Ways to Fix It - Hostinger
Error 404 is a response code, meaning the server could not locate the requested content. Check this article to learn 4 steps to...
Read more >
How to Fix Error 404 Not Found on Your WordPress Site - Kinsta
The Error 404 Not Found status code indicates that the origin server did not find the target resource. Check out these common causes...
Read more >
HTTP Status 404 error when downloading a file to the WSUS ...
This article addresses an HTTP Status 404 error that displays in the Patch Manager Administrator Console after the WSUS server downloads a file....
Read more >
404 Not Found Error: What It Is and How to Fix It - Airbrake Blog
A HTTP 404 error happens when a resource is unavailable. The client (web browser) received a message from the server (remote computer) that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found