Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow copying existing cookiejar for request.meta['cookiejar']

See original GitHub issue

Hi, scrapy developers!

Scrapy cookies middleware since 0.15 allows to have multiple cookies sessions per spider. However, as far as I understand, such new sessions are initialized with an empty cookiejar.

It could be useful to allow initializing such new session cookiejar with content of a previous one, which would so save cookies between two different sessions. I encounter such a need when, in a given project, I need to start new cookies sessions after main previous one already had some cookies dedicated to identifying user session. Of course in this case, subsequent requests of new cookies sessions need previous identification cookies.

To bypass limitation of actual cookies middleware implementation, I created here some very basic wrapping middleware, which allows such copy between old cookies session and new ones. This is set by new and ugly request.meta['copied_cookiejar'] specifying cookiejar key from which copy already stored cookies. I then use this middleware in lieu et place of scrapy.downloadermiddlewares.cookies.CookiesMiddleware; and in my spider, after namming first cookies session, I then initialized new one with something like this:

yield Request(url, callback=callback, meta={
    'copied_cookiejar': response.meta['cookiejar'],
    'cookiejar': new_cookiejar_id
    })

What you guys think about this need? And what implementation could be possible?

Issue Analytics

State:
Created 8 years ago
Reactions:6
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

awkejiangcommented, Aug 22, 2016

It would be useful especially for scrapy-redis, so different spiders can share login-after session.

1reaction

novilunicommented, Jul 17, 2020

I didn’t test it properly, but I used a custom cookies middleware to handle with this scenario:

class CustomCookiesMiddleware(CookiesMiddleware):
    """
    Allow to use `fork_from_cookiejar` in the meta to fork a cookiejar.

    How to use:
        meta={
            'fork_from_cookiejar': <old-cookiejar-id>,  # commonly: `response.meta['cookiejar']`
            'cookiejar': <new-cookiejar-id>,
        }
    """

    def process_request(self, request, spider):
        if request.meta.get('dont_merge_cookies', False):
            return

        cookiejarkey = request.meta.get("cookiejar")
        old_cookiejarkey = request.meta.get('fork_from_cookiejar')

        if old_cookiejarkey:
            old_jar = self.jars[old_cookiejarkey]
            if cookiejarkey in self.jars: 
                del self.jars[cookiejarkey]
            self.jars[cookiejarkey] = deepcopy(old_jar)

        jar = self.jars[cookiejarkey]

        for cookie in self._get_request_cookies(jar, request):
            jar.set_cookie_if_ok(cookie, request)

        # set Cookie header
        request.headers.pop('Cookie', None)
        jar.add_cookie_header(request)
        self._debug_cookie(request, spider)