question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exception when using DummyStatsCollector

See original GitHub issue

Description

Using the DummyStatsCollector results in an exception:

2019-09-09 13:51:23 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method CoreStats.spider_closed of <scrapy.extensions.corestats.CoreStats object at 0x7f86269cac18>>
Traceback (most recent call last):
  File ".../lib/python3.6/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File ".../lib/python3.6/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File ".../lib/python3.6/site-packages/scrapy/extensions/corestats.py", line 28, in spider_closed
    elapsed_time = finish_time - self.stats.get_value('start_time')
TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'NoneType'

This problem has been introduced in aa46e1995cd5cb1099aba17535372b538bd656b3.

Steps to Reproduce

Set STATS_CLASS = "scrapy.statscollectors.DummyStatsCollector" in the settings module as described in the documentation (https://docs.scrapy.org/en/latest/topics/stats.html#dummystatscollector).

Expected behavior: no exception Actual behavior: exception thrown Reproduces how often: always

Versions

At least master as of 534de7395da3a53b5a2c89960db9ec5d8fdab60c

Fix

A possible fix is to use the elapsed time as a default argument so that get_value() does not return None. I can prepare a PR if needed.

--- a/scrapy/extensions/corestats.py
+++ b/scrapy/extensions/corestats.py
@@ -25,7 +25,7 @@ class CoreStats(object):
 
     def spider_closed(self, spider, reason):
         finish_time = datetime.datetime.utcnow()
-        elapsed_time = finish_time - self.stats.get_value('start_time')
+        elapsed_time = finish_time - self.stats.get_value('start_time', finish_time)
         elapsed_time_seconds = elapsed_time.total_seconds()
         self.stats.set_value('elapsed_time_seconds', elapsed_time_seconds, spider=spider)
         self.stats.set_value('finish_time', finish_time, spider=spider)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
elacuestacommented, Sep 10, 2019

Makes sense, the extension should be robust enough not to break depending on the Stats class implementation. In that case, something along the following lines should do the trick:

diff --git a/scrapy/extensions/corestats.py b/scrapy/extensions/corestats.py
index 8cc5e18a..91d8558b 100644
--- a/scrapy/extensions/corestats.py
+++ b/scrapy/extensions/corestats.py
@@ -9,6 +9,7 @@ class CoreStats(object):

     def __init__(self, stats):
         self.stats = stats
+        self.start_time = None

     @classmethod
     def from_crawler(cls, crawler):
@@ -21,11 +22,12 @@ class CoreStats(object):
         return o

     def spider_opened(self, spider):
-        self.stats.set_value('start_time', datetime.datetime.utcnow(), spider=spider)
+        self.start_time = datetime.datetime.utcnow()
+        self.stats.set_value('start_time', self.start_time, spider=spider)

     def spider_closed(self, spider, reason):
         finish_time = datetime.datetime.utcnow()
-        elapsed_time = finish_time - self.stats.get_value('start_time')
+        elapsed_time = finish_time - self.start_time
         elapsed_time_seconds = elapsed_time.total_seconds()
         self.stats.set_value('elapsed_time_seconds', elapsed_time_seconds, spider=spider)
         self.stats.set_value('finish_time', finish_time, spider=spider)

I still think it might be a good idea to skip connecting the handlers though 😅 Let’s see what other committers think.

0reactions
elacuestacommented, May 19, 2021

why use datetime.datetime.utcnow() instead of datetime.datetime.now(timezone.utc)?

No specific reason that I can see, the previous code was using utcnow already and it just didn’t occur to me to change it. In any case, I don’t think it changes things, AFAICT all datetimes handled by Scrapy are in UTC and there are no timezone conversions. That said, I’m happy to be proven wrong. If you think there is something to be improved, please do suggest it or open a PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Stats Collection — Scrapy 2.7.1 documentation
The facility is called the Stats Collector, and can be accessed through the stats attribute of the Crawler API, as illustrated by the ......
Read more >
HOw to get dummy scrapy stuts count in scrapyd
How do i get the the "DummyStatsCollector" in scrapyd. I have studied from this link "http://doc.scrapy.org/en/latest/topics/stats.html# ...
Read more >
Scrapy - Stats Collection
DummyStatsCollector. This stats collector is very efficient which does nothing. This can be set using the STATS_CLASS setting and can be used to...
Read more >
Exceptions — Scrapy 1.0.1 documentation - Huihoo
This exception is raised to indicate an unsupported feature. ... Built with Sphinx using a theme provided by Read the Docs. Read the...
Read more >
org.apache.carbondata.core.datastore.page.ColumnPage. ...
Best Java code snippets using org.apache.carbondata.core.datastore.page. ... getCompressorName()), pageSize); } catch (MemoryException e) { throw new ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found