response.follow_all or SelectorList.follow_all shortcut
See original GitHub issueWhat do you think about adding response.follow_all shortcut, which returns a list of requests? This is inspired by this note in docs:
response.follow(response.css('li.next a'))
is not valid because response.css returns a list-like object with selectors for all results, not a single selector. A for loop like in the example above, or response.follow(response.css(‘li.next a’)[0]) is fine.
So instead of
for href in response.css('li.next a::attr(href)'):
yield response.follow(href, callback=self.parse)
users would be able to write (in Python 3)
yield from response.follow_all(response.css('li.next a::attr(href)'), self.parse)
We can also add ‘css’ and ‘xpath’ support to it, as keyword arguments; it would shorten the code to this:
yield from response.follow_all(css='li.next a::attr(href)', callback=self.parse)
(this is a follow-up to https://github.com/scrapy/scrapy/issues/1940 and https://github.com/scrapy/scrapy/issues/2540)
Issue Analytics
- State:
- Created 7 years ago
- Reactions:1
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Requests and Responses — Scrapy 2.7.1 documentation
Scrapy uses Request and Response objects for crawling web sites. ... Return an iterable of Request instances to follow all links in urls...
Read more >Scrapy follow vs follow_all - python - Stack Overflow
A little late: It is because the response.follow() does not accept css as parameter, so the code only fetches the page 1 and...
Read more >Requests and Responses - 《Scrapy v2.4 Documentation》
Scrapy uses Request and Response objects for crawling web sites. ... Return an iterable of Request instances to follow all links in urls...
Read more >Release Scrapy developers
response, and convenient shortcuts like response.xpath() and ... 'followall' is the name of one of the spiders of the project.
Read more >Developing Integration Projects with Oracle Data Integrator
schedule, stop a session, respond to a ping, or clean stale sessions. The standalone ... Follow all of the same processes as for...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@kmike, only a rough one.
I think of RequestSet as something that:
DeferredList
-like API (asyncio.gather
has the drawback of being a function rather than an object)has its ownhas a Deferred that would fire when the requestset is being cleaned upcallback
to be run when the request set is dealt with, probably anerrback
too, for symmetry'meta'
dictionary, much like the one shared between Request & Response objectsresponse.meta['request_set'] = self
so that the callbacks can access the shared datarequest_set.meta
toresponse.meta
if they are unset inrequest.meta
, or maybe even makerequest.meta
a ChainDict with fallback torequest_set.meta
request.meta['request_set'] = None
request_set.meta['request_set'] = None
(if nesting is considered, therequest_set
metadata key seems redundant and we might considerparent_set
instead.One more thing to consider is cross-referencing RequestSets, i.e. when two requests that should belong to one RequestSet are produced by different callbacks and thus have different scopes. Maybe a simple
WeakValueDictionary
would suffice to lookup the sets and ensure the references are cleaned up as necessary. But then you’d have the usual get-or-create operation, that might be worth creating an etalon implementation for.@kmike done: #2600