Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use priority queues for Downloader slot queues

See original GitHub issue

Currently downloader slots use collections.deque for requests queue. It means that once request came from a scheduler to downloader, its priority is no longer respected.

Let’s say global concurrency limit is 10, scheduler returned 10 requests with a low priority (all for a single downloader slot), then user scheduled a request with a high priority (for the same slot), then one of 10 low-priority requests was processed, and downloader fetched high-priority request from a scheduler. In this case this new high-priority request will be only handled after 9 existing low-priority requests.

What about using a priority queue from queuelib instead of deque?

//cc @dangra @shirk3y

Issue Analytics

State:
Created 8 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

GeorgeA92commented, May 30, 2022

Lets test this script with various settings

script

import scrapy; from scrapy.crawler import CrawlerProcess

class BooksToScrapeSpider(scrapy.Spider):
    name = "books"; start_urls = [f"https://books.toscrape.com/catalogue/page-{i}.html" for i in range(1,32)]
    custom_settings = {"DOWNLOAD_DELAY":1}

    def parse(self, response):
        yield scrapy.Request(
            response.urljoin(response.css('ol.row .product_pod a::attr(href)').get()),
            callback=self.parse_book,
            priority=10
        )

    def parse_book(self, response):
        pass

process = CrawlerProcess(); process.crawl(BooksToScrapeSpider); process.start()

1. Default concurrency settings (CONCURRENT_REQUESTS=16, CONCURRENT_REQUESTS_PER_DOMAIN=8)

log output (default settings except "DOWNLOAD_DELAY":1)

2022-05-30 16:42:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-1.html> (referer: None)
2022-05-30 16:42:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-2.html> (referer: None)
2022-05-30 16:42:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-3.html> (referer: None)
2022-05-30 16:42:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-4.html> (referer: None)
2022-05-30 16:42:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-5.html> (referer: None)
2022-05-30 16:42:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-6.html> (referer: None)
2022-05-30 16:42:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-7.html> (referer: None)
2022-05-30 16:42:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-8.html> (referer: None)
2022-05-30 16:42:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-9.html> (referer: None)
2022-05-30 16:42:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-10.html> (referer: None)
2022-05-30 16:42:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-11.html> (referer: None)
2022-05-30 16:42:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-12.html> (referer: None)
2022-05-30 16:42:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-13.html> (referer: None)
2022-05-30 16:42:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-14.html> (referer: None)
2022-05-30 16:42:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-15.html> (referer: None)
2022-05-30 16:42:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-16.html> (referer: None)
2022-05-30 16:42:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-17.html> (referer: None)
2022-05-30 16:42:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html> (referer: https://books.toscrape.com/catalogue/page-1.html)
2022-05-30 16:42:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/in-her-wake_980/index.html> (referer: https://books.toscrape.com/catalogue/page-2.html)
2022-05-30 16:42:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/slow-states-of-collapse-poems_960/index.html> (referer: https://books.toscrape.com/catalogue/page-3.html)
2022-05-30 16:42:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-nameless-city-the-nameless-city-1_940/index.html> (referer: https://books.toscrape.com/catalogue/page-4.html)
2022-05-30 16:42:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/princess-jellyfish-2-in-1-omnibus-vol-01-princess-jellyfish-2-in-1-omnibus-1_920/index.html> (referer: https://books.toscrape.com/catalogue/page-5.html)
2022-05-30 16:42:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/immunity-how-elie-metchnikoff-changed-the-course-of-modern-medicine_900/index.html> (referer: https://books.toscrape.com/catalogue/page-6.html)
2022-05-30 16:42:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/algorithms-to-live-by-the-computer-science-of-human-decisions_880/index.html> (referer: https://books.toscrape.com/catalogue/page-7.html)
2022-05-30 16:42:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-shadow-hero-the-shadow-hero_860/index.html> (referer: https://books.toscrape.com/catalogue/page-8.html)
2022-05-30 16:42:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-bridge-to-consciousness-im-writing-the-bridge-between-science-and-our-old-and-new-beliefs_840/index.html> (referer: https://books.toscrape.com/catalogue/page-9.html)
2022-05-30 16:42:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/modern-romance_820/index.html> (referer: https://books.toscrape.com/catalogue/page-10.html)
2022-05-30 16:42:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/dark-notes_800/index.html> (referer: https://books.toscrape.com/catalogue/page-11.html)
2022-05-30 16:42:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/whole-lotta-creativity-going-on-60-fun-and-unusual-exercises-to-awaken-and-strengthen-your-creativity_780/index.html> (referer: https://books.toscrape.com/catalogue/page-12.html)
2022-05-30 16:42:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-power-of-habit-why-we-do-what-we-do-in-life-and-business_760/index.html> (referer: https://books.toscrape.com/catalogue/page-13.html)
2022-05-30 16:42:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/swell-a-year-of-waves_740/index.html> (referer: https://books.toscrape.com/catalogue/page-14.html)
2022-05-30 16:42:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/my-name-is-lucy-barton_720/index.html> (referer: https://books.toscrape.com/catalogue/page-15.html)
2022-05-30 16:42:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/hold-your-breath-search-and-rescue-1_700/index.html> (referer: https://books.toscrape.com/catalogue/page-16.html)
2022-05-30 16:42:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/code-name-verity-code-name-verity-1_680/index.html> (referer: https://books.toscrape.com/catalogue/page-17.html)
2022-05-30 16:42:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-18.html> (referer: None)
2022-05-30 16:42:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-19.html> (referer: None)
2022-05-30 16:42:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-20.html> (referer: None)
2022-05-30 16:42:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-21.html> (referer: None)
2022-05-30 16:43:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-22.html> (referer: None)
2022-05-30 16:43:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-23.html> (referer: None)
2022-05-30 16:43:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-24.html> (referer: None)
2022-05-30 16:43:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-25.html> (referer: None)
2022-05-30 16:43:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-26.html> (referer: None)
2022-05-30 16:43:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-27.html> (referer: None)
2022-05-30 16:43:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-28.html> (referer: None)
2022-05-30 16:43:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-29.html> (referer: None)
2022-05-30 16:43:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-30.html> (referer: None)
2022-05-30 16:43:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-31.html> (referer: None)
2022-05-30 16:43:11 [scrapy.extensions.logstats] INFO: Crawled 48 pages (at 48 pages/min), scraped 0 items (at 0 items/min)
2022-05-30 16:43:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/all-the-light-we-cannot-see_660/index.html> (referer: https://books.toscrape.com/catalogue/page-18.html)
2022-05-30 16:43:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-midnight-watch-a-novel-of-the-titanic-and-the-californian_640/index.html> (referer: https://books.toscrape.com/catalogue/page-19.html)
2022-05-30 16:43:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/hide-away-eve-duncan-20_620/index.html> (referer: https://books.toscrape.com/catalogue/page-20.html)
2022-05-30 16:43:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/mothering-sunday_600/index.html> (referer: https://books.toscrape.com/catalogue/page-21.html)
2022-05-30 16:43:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/without-shame_580/index.html> (referer: https://books.toscrape.com/catalogue/page-22.html)
2022-05-30 16:43:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/chernobyl-012340-the-incredible-true-story-of-the-worlds-worst-nuclear-disaster_560/index.html> (referer: https://books.toscrape.com/catalogue/page-23.html)
2022-05-30 16:43:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/roller-girl_540/index.html> (referer: https://books.toscrape.com/catalogue/page-24.html)
2022-05-30 16:43:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/heaven-is-for-real-a-little-boys-astounding-story-of-his-trip-to-heaven-and-back_520/index.html> (referer: https://books.toscrape.com/catalogue/page-25.html)
2022-05-30 16:43:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-story-of-art_500/index.html> (referer: https://books.toscrape.com/catalogue/page-26.html)
2022-05-30 16:43:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/nightstruck-a-novel_480/index.html> (referer: https://books.toscrape.com/catalogue/page-27.html)
2022-05-30 16:43:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/benjamin-franklin-an-american-life_460/index.html> (referer: https://books.toscrape.com/catalogue/page-28.html)
2022-05-30 16:43:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-barefoot-contessa-cookbook_440/index.html> (referer: https://books.toscrape.com/catalogue/page-29.html)
2022-05-30 16:43:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/world-without-end-the-pillars-of-the-earth-2_420/index.html> (referer: https://books.toscrape.com/catalogue/page-30.html)
2022-05-30 16:43:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-dream-thieves-the-raven-cycle-2_400/index.html> (referer: https://books.toscrape.com/catalogue/page-31.html)
2022-05-30 16:43:26 [scrapy.core.engine] INFO: Closing spider (finished)

Here we see directly the same requests processing order as on first message of this issue.

2. Custom settings {"DOWNLOAD_DELAY":1, "CONCURRENT_REQUESTS":1, "CONCURRENT_REQUESTS_PER_DOMAIN":1 } With this confirugation requests priority will be counted from both scheduler and downloader sides(as it requested here) . Scheduler - because it already have priority queue. Downloader - because size of it’s queue reduced to size of 1 by custom settings (so downloader queue will always contain the most prioritized request).

log output

2022-05-30 16:53:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-1.html> (referer: None)
2022-05-30 16:53:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-2.html> (referer: None)
2022-05-30 16:53:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html> (referer: https://books.toscrape.com/catalogue/page-1.html)
2022-05-30 16:53:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/in-her-wake_980/index.html> (referer: https://books.toscrape.com/catalogue/page-2.html)
2022-05-30 16:53:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-3.html> (referer: None)
2022-05-30 16:53:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-4.html> (referer: None)
2022-05-30 16:53:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/slow-states-of-collapse-poems_960/index.html> (referer: https://books.toscrape.com/catalogue/page-3.html)
2022-05-30 16:53:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-nameless-city-the-nameless-city-1_940/index.html> (referer: https://books.toscrape.com/catalogue/page-4.html)
2022-05-30 16:53:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-5.html> (referer: None)
2022-05-30 16:53:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-6.html> (referer: None)
2022-05-30 16:53:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/princess-jellyfish-2-in-1-omnibus-vol-01-princess-jellyfish-2-in-1-omnibus-1_920/index.html> (referer: https://books.toscrape.com/catalogue/page-5.html)
2022-05-30 16:53:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/immunity-how-elie-metchnikoff-changed-the-course-of-modern-medicine_900/index.html> (referer: https://books.toscrape.com/catalogue/page-6.html)
2022-05-30 16:53:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-7.html> (referer: None)
2022-05-30 16:53:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-8.html> (referer: None)
2022-05-30 16:53:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/algorithms-to-live-by-the-computer-science-of-human-decisions_880/index.html> (referer: https://books.toscrape.com/catalogue/page-7.html)
2022-05-30 16:53:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-shadow-hero-the-shadow-hero_860/index.html> (referer: https://books.toscrape.com/catalogue/page-8.html)
2022-05-30 16:53:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-9.html> (referer: None)
2022-05-30 16:53:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-10.html> (referer: None)
2022-05-30 16:53:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-bridge-to-consciousness-im-writing-the-bridge-between-science-and-our-old-and-new-beliefs_840/index.html> (referer: https://books.toscrape.com/catalogue/page-9.html)
2022-05-30 16:53:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/modern-romance_820/index.html> (referer: https://books.toscrape.com/catalogue/page-10.html)
2022-05-30 16:53:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-11.html> (referer: None)
2022-05-30 16:53:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-12.html> (referer: None)
2022-05-30 16:53:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/dark-notes_800/index.html> (referer: https://books.toscrape.com/catalogue/page-11.html)
2022-05-30 16:53:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/whole-lotta-creativity-going-on-60-fun-and-unusual-exercises-to-awaken-and-strengthen-your-creativity_780/index.html> (referer: https://books.toscrape.com/catalogue/page-12.html)
2022-05-30 16:53:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-13.html> (referer: None)
2022-05-30 16:53:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-14.html> (referer: None)
2022-05-30 16:53:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-power-of-habit-why-we-do-what-we-do-in-life-and-business_760/index.html> (referer: https://books.toscrape.com/catalogue/page-13.html)
2022-05-30 16:54:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/swell-a-year-of-waves_740/index.html> (referer: https://books.toscrape.com/catalogue/page-14.html)
2022-05-30 16:54:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-15.html> (referer: None)
2022-05-30 16:54:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-16.html> (referer: None)
2022-05-30 16:54:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/my-name-is-lucy-barton_720/index.html> (referer: https://books.toscrape.com/catalogue/page-15.html)
2022-05-30 16:54:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/hold-your-breath-search-and-rescue-1_700/index.html> (referer: https://books.toscrape.com/catalogue/page-16.html)
2022-05-30 16:54:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-17.html> (referer: None)
2022-05-30 16:54:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-18.html> (referer: None)
2022-05-30 16:54:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/code-name-verity-code-name-verity-1_680/index.html> (referer: https://books.toscrape.com/catalogue/page-17.html)
2022-05-30 16:54:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/all-the-light-we-cannot-see_660/index.html> (referer: https://books.toscrape.com/catalogue/page-18.html)
2022-05-30 16:54:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-19.html> (referer: None)
2022-05-30 16:54:12 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-20.html> (referer: None)
2022-05-30 16:54:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-midnight-watch-a-novel-of-the-titanic-and-the-californian_640/index.html> (referer: https://books.toscrape.com/catalogue/page-19.html)
2022-05-30 16:54:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/hide-away-eve-duncan-20_620/index.html> (referer: https://books.toscrape.com/catalogue/page-20.html)
2022-05-30 16:54:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-21.html> (referer: None)
2022-05-30 16:54:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-22.html> (referer: None)
2022-05-30 16:54:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/mothering-sunday_600/index.html> (referer: https://books.toscrape.com/catalogue/page-21.html)
2022-05-30 16:54:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/without-shame_580/index.html> (referer: https://books.toscrape.com/catalogue/page-22.html)
2022-05-30 16:54:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-23.html> (referer: None)
2022-05-30 16:54:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-24.html> (referer: None)
2022-05-30 16:54:23 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/chernobyl-012340-the-incredible-true-story-of-the-worlds-worst-nuclear-disaster_560/index.html> (referer: https://books.toscrape.com/catalogue/page-23.html)
2022-05-30 16:54:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/roller-girl_540/index.html> (referer: https://books.toscrape.com/catalogue/page-24.html)
2022-05-30 16:54:25 [scrapy.extensions.logstats] INFO: Crawled 48 pages (at 48 pages/min), scraped 0 items (at 0 items/min)
2022-05-30 16:54:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-25.html> (referer: None)
2022-05-30 16:54:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-26.html> (referer: None)
2022-05-30 16:54:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/heaven-is-for-real-a-little-boys-astounding-story-of-his-trip-to-heaven-and-back_520/index.html> (referer: https://books.toscrape.com/catalogue/page-25.html)
2022-05-30 16:54:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-story-of-art_500/index.html> (referer: https://books.toscrape.com/catalogue/page-26.html)
2022-05-30 16:54:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-27.html> (referer: None)
2022-05-30 16:54:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-28.html> (referer: None)
2022-05-30 16:54:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/nightstruck-a-novel_480/index.html> (referer: https://books.toscrape.com/catalogue/page-27.html)
2022-05-30 16:54:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/benjamin-franklin-an-american-life_460/index.html> (referer: https://books.toscrape.com/catalogue/page-28.html)
2022-05-30 16:54:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-29.html> (referer: None)
2022-05-30 16:54:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-30.html> (referer: None)
2022-05-30 16:54:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-barefoot-contessa-cookbook_440/index.html> (referer: https://books.toscrape.com/catalogue/page-29.html)
2022-05-30 16:54:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/world-without-end-the-pillars-of-the-earth-2_420/index.html> (referer: https://books.toscrape.com/catalogue/page-30.html)
2022-05-30 16:54:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-31.html> (referer: None)
2022-05-30 16:54:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-dream-thieves-the-raven-cycle-2_400/index.html> (referer: https://books.toscrape.com/catalogue/page-31.html)

This look better. But it still not expected strict order of requests (Low priority 1, High priority1, Low priority2, High priority 2, etc).

When downloader received first response (...page1.html) - application asked scheduler for next request to send it to downloader. As first response (...page1.html) at that moment didn’t parsed (and it didn’t produced new high priority request) - it took next request from scheduler queue (low priority ...page2.html) end sent it to server. Technically application is still respects request priorities.

Key point of this - is that low priority request moved from scheduler queue to downloader queue without waiting results of parse of received low priority request (which produce high priority request we expect to send next). In this case (as well as with implemented priority queue for downloader) we will not receive completely fixed/strict order of requests.

It happened because… it allowed by default settings. https://github.com/scrapy/scrapy/blob/afa5881ada816a2fc5555f6272dbfe87f7973222/scrapy/settings/default_settings.py#L263 This setting means that it is allowed to send request from scheduler queue to downloader queue if total size of not parsed responses is less than SCRAPER_SLOT_MAX_ACTIVE_SIZE(~5mb) so this is direct reason of not strict order of requests

3.Custom settings (reduced scraper slot max active size) {"DOWNLOAD_DELAY":1, "CONCURRENT_REQUESTS":1, "CONCURRENT_REQUESTS_PER_DOMAIN":1, "SCRAPER_SLOT_MAX_ACTIVE_SIZE":0 }

log output

2022-05-30 18:07:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-1.html> (referer: None)
2022-05-30 18:07:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html> (referer: https://books.toscrape.com/catalogue/page-1.html)
2022-05-30 18:07:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-2.html> (referer: None)
2022-05-30 18:07:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/in-her-wake_980/index.html> (referer: https://books.toscrape.com/catalogue/page-2.html)
2022-05-30 18:07:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-3.html> (referer: None)
2022-05-30 18:07:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/slow-states-of-collapse-poems_960/index.html> (referer: https://books.toscrape.com/catalogue/page-3.html)
2022-05-30 18:07:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-4.html> (referer: None)
2022-05-30 18:07:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-nameless-city-the-nameless-city-1_940/index.html> (referer: https://books.toscrape.com/catalogue/page-4.html)
2022-05-30 18:07:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-5.html> (referer: None)
2022-05-30 18:08:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/princess-jellyfish-2-in-1-omnibus-vol-01-princess-jellyfish-2-in-1-omnibus-1_920/index.html> (referer: https://books.toscrape.com/catalogue/page-5.html)
2022-05-30 18:08:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-6.html> (referer: None)
2022-05-30 18:08:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/immunity-how-elie-metchnikoff-changed-the-course-of-modern-medicine_900/index.html> (referer: https://books.toscrape.com/catalogue/page-6.html)
2022-05-30 18:08:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-7.html> (referer: None)
2022-05-30 18:08:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/algorithms-to-live-by-the-computer-science-of-human-decisions_880/index.html> (referer: https://books.toscrape.com/catalogue/page-7.html)
2022-05-30 18:08:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-8.html> (referer: None)
2022-05-30 18:08:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-shadow-hero-the-shadow-hero_860/index.html> (referer: https://books.toscrape.com/catalogue/page-8.html)
2022-05-30 18:08:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-9.html> (referer: None)
2022-05-30 18:08:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-bridge-to-consciousness-im-writing-the-bridge-between-science-and-our-old-and-new-beliefs_840/index.html> (referer: https://books.toscrape.com/catalogue/page-9.html)
2022-05-30 18:08:10 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-10.html> (referer: None)
2022-05-30 18:08:11 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/modern-romance_820/index.html> (referer: https://books.toscrape.com/catalogue/page-10.html)
2022-05-30 18:08:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-11.html> (referer: None)
2022-05-30 18:08:14 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/dark-notes_800/index.html> (referer: https://books.toscrape.com/catalogue/page-11.html)
2022-05-30 18:08:16 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-12.html> (referer: None)
2022-05-30 18:08:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/whole-lotta-creativity-going-on-60-fun-and-unusual-exercises-to-awaken-and-strengthen-your-creativity_780/index.html> (referer: https://books.toscrape.com/catalogue/page-12.html)
2022-05-30 18:08:18 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-13.html> (referer: None)
2022-05-30 18:08:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-power-of-habit-why-we-do-what-we-do-in-life-and-business_760/index.html> (referer: https://books.toscrape.com/catalogue/page-13.html)
2022-05-30 18:08:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-14.html> (referer: None)
2022-05-30 18:08:22 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/swell-a-year-of-waves_740/index.html> (referer: https://books.toscrape.com/catalogue/page-14.html)
2022-05-30 18:08:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-15.html> (referer: None)
2022-05-30 18:08:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/my-name-is-lucy-barton_720/index.html> (referer: https://books.toscrape.com/catalogue/page-15.html)
2022-05-30 18:08:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-16.html> (referer: None)
2022-05-30 18:08:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/hold-your-breath-search-and-rescue-1_700/index.html> (referer: https://books.toscrape.com/catalogue/page-16.html)
2022-05-30 18:08:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-17.html> (referer: None)
2022-05-30 18:08:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/code-name-verity-code-name-verity-1_680/index.html> (referer: https://books.toscrape.com/catalogue/page-17.html)
2022-05-30 18:08:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-18.html> (referer: None)
2022-05-30 18:08:33 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/all-the-light-we-cannot-see_660/index.html> (referer: https://books.toscrape.com/catalogue/page-18.html)
2022-05-30 18:08:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-19.html> (referer: None)
2022-05-30 18:08:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-midnight-watch-a-novel-of-the-titanic-and-the-californian_640/index.html> (referer: https://books.toscrape.com/catalogue/page-19.html)
2022-05-30 18:08:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-20.html> (referer: None)
2022-05-30 18:08:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/hide-away-eve-duncan-20_620/index.html> (referer: https://books.toscrape.com/catalogue/page-20.html)
2022-05-30 18:08:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-21.html> (referer: None)
2022-05-30 18:08:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/mothering-sunday_600/index.html> (referer: https://books.toscrape.com/catalogue/page-21.html)
2022-05-30 18:08:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-22.html> (referer: None)
2022-05-30 18:08:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/without-shame_580/index.html> (referer: https://books.toscrape.com/catalogue/page-22.html)
2022-05-30 18:08:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-23.html> (referer: None)
2022-05-30 18:08:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/chernobyl-012340-the-incredible-true-story-of-the-worlds-worst-nuclear-disaster_560/index.html> (referer: https://books.toscrape.com/catalogue/page-23.html)
2022-05-30 18:08:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-24.html> (referer: None)
2022-05-30 18:08:48 [scrapy.extensions.logstats] INFO: Crawled 47 pages (at 47 pages/min), scraped 0 items (at 0 items/min)
2022-05-30 18:08:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/roller-girl_540/index.html> (referer: https://books.toscrape.com/catalogue/page-24.html)
2022-05-30 18:08:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-25.html> (referer: None)
2022-05-30 18:08:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/heaven-is-for-real-a-little-boys-astounding-story-of-his-trip-to-heaven-and-back_520/index.html> (referer: https://books.toscrape.com/catalogue/page-25.html)
2022-05-30 18:08:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-26.html> (referer: None)
2022-05-30 18:08:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-story-of-art_500/index.html> (referer: https://books.toscrape.com/catalogue/page-26.html)
2022-05-30 18:08:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-27.html> (referer: None)
2022-05-30 18:08:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/nightstruck-a-novel_480/index.html> (referer: https://books.toscrape.com/catalogue/page-27.html)
2022-05-30 18:08:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-28.html> (referer: None)
2022-05-30 18:08:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/benjamin-franklin-an-american-life_460/index.html> (referer: https://books.toscrape.com/catalogue/page-28.html)
2022-05-30 18:08:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-29.html> (referer: None)
2022-05-30 18:08:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-barefoot-contessa-cookbook_440/index.html> (referer: https://books.toscrape.com/catalogue/page-29.html)
2022-05-30 18:09:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-30.html> (referer: None)
2022-05-30 18:09:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/world-without-end-the-pillars-of-the-earth-2_420/index.html> (referer: https://books.toscrape.com/catalogue/page-30.html)
2022-05-30 18:09:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/page-31.html> (referer: None)
2022-05-30 18:09:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://books.toscrape.com/catalogue/the-dream-thieves-the-raven-cycle-2_400/index.html> (referer: https://books.toscrape.com/catalogue/page-31.html)

With this configuration: setting SCRAPER_SLOT_MAX_ACTIVE_SIZE to 0 in addition to other reduced concurrency settings guarantee that next request from scheduler will be moved to downloader only after all received responses will be processed.

This configuration may have slower runtime performance comparing to default ~5mb setting value (especially with lower or zero values of DOWNLOAD_DELAY setting) but it allows to make more… precise control of request sending/processing order

0reactions

Gallaeciocommented, May 31, 2022

I am starting to think maybe we should not make any change code-wise here, and instead make sure the documentation explains clearly what @GeorgeA92 covered above.

On a related note: at the moment, the scheduler handles request feed order, and the downloader handles slots. But slots should be taken into account for proper request ordering, and so we end up with something like DownloaderAwarePriorityQueue for the scheduler. I wonder if we should move slot handling to the scheduler instead.

Top Results From Across the Web

Multi-Level Priority Queues - Cisco

The Multi-Level Priority Queues (MPQ) feature allows you to configure multiple priority queues for multiple traffic classes by specifying a different priority ...

Queues: priority and delay - Amazon Connect

Priority and delay are powerful features that allow you to load balance contacts among groups of agents. Example 1: Different priority but same...

queue — A synchronized queue class — Python 3.11.1 ...

With a priority queue, the entries are kept sorted (using the heapq module) and the lowest valued entry is retrieved first. Internally, those...

What is the Python Priority Queue? | Linode

A queue that retrieves and removes items based on their priority as well as their arrival time is called a priority queue. Prioritization...

A sample time slot. On the left: a single priority queue with ...

Download scientific diagram | A sample time slot. On the left: a single priority queue with buffer of size B = 6 ;...