Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

response.follow_all or SelectorList.follow_all shortcut

See original GitHub issue

What do you think about adding response.follow_all shortcut, which returns a list of requests? This is inspired by this note in docs:

response.follow(response.css('li.next a')) is not valid because response.css returns a list-like object with selectors for all results, not a single selector. A for loop like in the example above, or response.follow(response.css(‘li.next a’)[0]) is fine.

So instead of

for href in response.css('li.next a::attr(href)'):
    yield response.follow(href, callback=self.parse)

users would be able to write (in Python 3)

yield from response.follow_all(response.css('li.next a::attr(href)'), self.parse)

We can also add ‘css’ and ‘xpath’ support to it, as keyword arguments; it would shorten the code to this:

yield from response.follow_all(css='li.next a::attr(href)', callback=self.parse)

(this is a follow-up to https://github.com/scrapy/scrapy/issues/1940 and https://github.com/scrapy/scrapy/issues/2540)

Issue Analytics

State:
Created 7 years ago
Reactions:1
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

immerrrcommented, Feb 27, 2017

@kmike, only a rough one.

I think of RequestSet as something that:

has a DeferredList-like API (asyncio.gather has the drawback of being a function rather than an object)
knows that it contains Requests as deferreds
~~has its own callback to be run when the request set is dealt with, probably an errback too, for symmetry~~ has a Deferred that would fire when the requestset is being cleaned up
has its own 'meta' dictionary, much like the one shared between Request & Response objects
since it knows that it contains Requests, it can piggyback on the first received value and do response.meta['request_set'] = self so that the callbacks can access the shared data
(maybe) it should silently copy fields from request_set.meta to response.meta if they are unset in request.meta, or maybe even make request.meta a ChainDict with fallback to request_set.meta
it should wrap requests coming from its respective response callbacks unless specifically asked not to do that, e.g. with request.meta['request_set'] = None
(maybe) it should be possible to return other RequestSet from response callbacks
(maybe) returned RequestSets should be made nestable, i.e. to keep the parent RequestSet alive during their lifetime if not explicitly asked not to with request_set.meta['request_set'] = None (if nesting is considered, the request_set metadata key seems redundant and we might consider parent_set instead.
not sure if it’s worth it to make them nestable, i.e. if a certain response callback produces a different RequestSet, should it be owned by the parent request set?

One more thing to consider is cross-referencing RequestSets, i.e. when two requests that should belong to one RequestSet are produced by different callbacks and thus have different scopes. Maybe a simple WeakValueDictionary would suffice to lookup the sets and ensure the references are cleaned up as necessary. But then you’d have the usual get-or-create operation, that might be worth creating an etalon implementation for.

0reactions

immerrrcommented, Feb 27, 2017

@kmike done: #2600