question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ElasticsearchStorage can't save scraped files

See original GitHub issue

Describe the bug ElasticsearchStorage pipeline doesn’t parse downloaded html. File “/usr/local/lib/python3.8/site-packages/newsplease/pipeline/pipelines.py”, line 485, To Reproduce

Dockerfile

FROM python:3.6
RUN pip3 install news-please

sitelist.hjson

{
  # Every URL has to be in an array-object in "base_urls".
  # The same URL in combination with the same crawler may only appear once in this array.
  "base_urls" : [
    {      
           "crawler": "Download",
           "url": "https://www.bbc.co.uk/news/world-latin-america-50248739"
    }
  ]
}

config.cfg

ITEM_PIPELINES = {'newsplease.pipeline.pipelines.ArticleMasterExtractor':100,
                  'newsplease.pipeline.pipelines.ElasticsearchStorage': 350
                  }

Expected behavior File should be downloaded and piped to elastic search

Log

elasticsearch:121|INFO] GET http://elasticsearch:9200/news-please/_search [status:200 request:0.003s]
[scrapy.core.scraper:234|ERROR] Error processing {'abs_local_path': '/root/news-please-repo/data/2019/10/31/bbc.co.uk/news_world-latin-america-50248739_1572539376.html',
 'article_author': [],
 'article_description': 'Supporters and opponents of incumbent President Evo '
                        'Morales dispute the result of the vote.',
 'article_image': 'https://ichef.bbci.co.uk/news/1024/branded_news/32A7/production/_109476921_d51e9117-4bb9-4f8d-a4cc-c06e92664c1e.jpg',
 'article_language': 'en',
 'article_publish_date': '2019-10-31 12:44:49',
 'article_text': 'Image copyright AFP Image caption Supporters of President '
                 'Morales (in the background) and supporters of Mr Mesa '
                 '(foreground) have been\n'
                 'At least two people have been killed in Bolivia in clashes '
                 'between supporters and opponents of President Evo Morales, '
                 'the government says.\n'
                 'The two men died in the town of Montero in eastern Santa '
                 'Cruz province.\n'
                 'Tension has been running high for the past 10 days following '
                 'the disputed presidential election results.\n'
                 'The Organization of American States (OAS) will start an '
                 'audit of the results on Thursday to decide if the polls '
                 'should go into a second round.\n'
                 'The official results gave the incumbent, Evo Morales, a big '
                 'enough lead over his nearest rival, Carlos Mesa, to win '
                 'outright in the first round.\n'
                 'But many Bolivians say they are suspicious of the initial '
                 'vote count, which was surprisingly interrupted for 24 hours '
                 'on election night.\n'
                 'At the time when the counting was inexplicably halted, the '
                 'two candidates looked set to go into a second round, but '
                 "when the counting restarted, Mr Morales' lead jumped.\n"
                 'Image copyright Reuters Image caption Critics of the count '
                 'filled a coffin with fake bills to suggest the result had '
                 'been rigged\n'
                 'The final result gave Mr Morales just over the '
                 '10-percentage-point lead he needed to stave off a second '
                 'round. Mr Mesa said the result was fraudulent and election '
                 'observers from the OAS also expressed their concerns.\n'
                 'The following 10 days were marred by mass protests, strikes, '
                 'blockades and clashes between those backing Mr Morales and '
                 'those behind Mr Mesa.\n'
                 'Image copyright AFP Image caption Miners marched in support '
                 'of President Morales on Tuesday...\n'
                 'Media playback is unsupported on your device Media caption '
                 '...while supporters of Mr Mesa also took to the streets\n'
                 'Interior Minister Carlos Romero said that two men had been '
                 'killed in Montero and six injured. He said there would be an '
                 'investigation into the deaths. Local media has reported that '
                 'one of the two victims was taken to hospital with gunshot '
                 'wounds.\n'
                 "'Coup d'état'\n"
                 'Mr Morales says that the protests amount to a "coup '
                 'd\'état". Mr Mesa and his supporters argue that Mr Morales, '
                 'who has governed Bolivia since January 2006, is trying to '
                 'stay in power by rigging the election result.\n'
                 "On Wednesday, Mr Morales' Mas party agreed to a binding "
                 'audit of the results by the OAS. But Mr Mesa, who had '
                 'previously backed the idea of an audit, said that he now '
                 'rejected it, arguing that because it had been agreed '
                 '"unilaterally" between the OAS and Mas, he did not trust '
                 'it.\n'
                 '"The audit between the OAS and the Mas candidate was agreed '
                 'without any consultation with the country," he said in a '
                 'video message posted on Twitter [in Spanish].\n'
                 "He demanded that representatives from Bolivia's civil "
                 'society be represented.\n'
                 'The OAS said the audit would take between 10 to 12 days to '
                 'complete and Foreign Minister Diego Pary added that the '
                 'Bolivian government had also invited observers from Mexico, '
                 'Paraguay and Spain to monitor it.',
 'article_title': "Bolivia's post-election clashes turn deadly as two are "
                  'killed',
 'download_date': '2019-10-31 16:29:36',
 'filename': 'news_world-latin-america-50248739_1572539376.html',
 'html_title': b"Bolivia's post-election clashes turn deadly as two are kille"
               b'd - BBC News',
 'local_path': '/root/news-please-repo//data/2019/10/31/bbc.co.uk/news_world-latin-america-50248739_1572539376.html',
 'modified_date': '2019-10-31 16:29:36',
 'rss_title': 'NULL',
 'source_domain': b'bbc.co.uk',
 'spider_response': <200 https://www.bbc.co.uk/news/world-latin-america-50248739>,
 'url': 'https://www.bbc.co.uk/news/world-latin-america-50248739'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python3.8/site-packages/newsplease/pipeline/pipelines.py", line 485, in process_item
    if request['hits']['total']['value'] > 0:
TypeError: 'int' object is not subscriptable
[scrapy.core.engine:294|INFO] Closing spider (finished)

Versions (please complete the following information):

  • Python Version 3.6
  • news-please ‘1.4.23’

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
fhamborgcommented, Nov 8, 2019

(so you could just simply do: pip install -U news-please)

2reactions
fhamborgcommented, Nov 8, 2019

Thank you very much for the PR, @JeromeGill ! I merged it and uploaded a new version to PyPi, containing the fix: https://pypi.org/project/news-please/1.4.24/

Read more comments on GitHub >

github_iconTop Results From Across the Web

TypeError when putting scraped data from scrapy into ...
python - TypeError when putting scraped data from scrapy into elasticsearch - Stack Overflow. Stack Overflow for Teams – Start collaborating ...
Read more >
A Dive into the Elasticsearch Storage
In this article we'll investigate the files written to the data directory by various parts of Elasticsearch. We will look at node, index...
Read more >
Storing 50 Million Events Per Second in Elasticsearch
Our cluster stores more than 150TB of data, 15 trillion events in 60 billion documents, spread across 3,000 indexes and 15,000 shards over...
Read more >
Getting started with Elasticsearch in Python
The objective is to access online recipes and store them in Elasticsearch for searching and analytics purpose. We will first scrape data from ......
Read more >
Extract-Transform-Load in Elasticsearch and Python
Elasticsearch, by default, has a REST interface to interact with it, which acts as a universal interface between the data store and the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found