question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow copying existing cookiejar for request.meta['cookiejar']

See original GitHub issue

Hi, scrapy developers!

Scrapy cookies middleware since 0.15 allows to have multiple cookies sessions per spider. However, as far as I understand, such new sessions are initialized with an empty cookiejar.

It could be useful to allow initializing such new session cookiejar with content of a previous one, which would so save cookies between two different sessions. I encounter such a need when, in a given project, I need to start new cookies sessions after main previous one already had some cookies dedicated to identifying user session. Of course in this case, subsequent requests of new cookies sessions need previous identification cookies.

To bypass limitation of actual cookies middleware implementation, I created here some very basic wrapping middleware, which allows such copy between old cookies session and new ones. This is set by new and ugly request.meta['copied_cookiejar'] specifying cookiejar key from which copy already stored cookies. I then use this middleware in lieu et place of scrapy.downloadermiddlewares.cookies.CookiesMiddleware; and in my spider, after namming first cookies session, I then initialized new one with something like this:

yield Request(url, callback=callback, meta={
    'copied_cookiejar': response.meta['cookiejar'],
    'cookiejar': new_cookiejar_id
    })

What you guys think about this need? And what implementation could be possible?

Issue Analytics

  • State:open
  • Created 8 years ago
  • Reactions:6
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
awkejiangcommented, Aug 22, 2016

It would be useful especially for scrapy-redis, so different spiders can share login-after session.

1reaction
novilunicommented, Jul 17, 2020

I didn’t test it properly, but I used a custom cookies middleware to handle with this scenario:

class CustomCookiesMiddleware(CookiesMiddleware):
    """
    Allow to use `fork_from_cookiejar` in the meta to fork a cookiejar.

    How to use:
        meta={
            'fork_from_cookiejar': <old-cookiejar-id>,  # commonly: `response.meta['cookiejar']`
            'cookiejar': <new-cookiejar-id>,
        }
    """

    def process_request(self, request, spider):
        if request.meta.get('dont_merge_cookies', False):
            return

        cookiejarkey = request.meta.get("cookiejar")
        old_cookiejarkey = request.meta.get('fork_from_cookiejar')

        if old_cookiejarkey:
            old_jar = self.jars[old_cookiejarkey]
            if cookiejarkey in self.jars: 
                del self.jars[cookiejarkey]
            self.jars[cookiejarkey] = deepcopy(old_jar)

        jar = self.jars[cookiejarkey]

        for cookie in self._get_request_cookies(jar, request):
            jar.set_cookie_if_ok(cookie, request)

        # set Cookie header
        request.headers.pop('Cookie', None)
        jar.add_cookie_header(request)
        self._debug_cookie(request, spider)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy - how to manage cookies/sessions - Stack Overflow
CookiesMiddleware supports multiple cookiejars. You have to control which cookiejar to use on the request level. Request.meta["cookiejar"] ...
Read more >
Synchronous Scrapy Requests - Reddit
Hey guys, Are you aware of any way to make scrapy requests synchronous? ... request): return self.captchas.get(request.meta['cookiejar']) ...
Read more >
Scrapy Documentation - Read the Docs
means that Scrapy doesn't need to wait for a request to be finished and ... allow you to not conflict with already-installed Python...
Read more >
Scrapy 2.6.1 documentation
This means that Scrapy doesn't need to wait for a request to be finished and ... Let's open up scrapy shell and play...
Read more >
Python 爬虫框架Scrapy - SixDegree
.request : 产生Response类型对应的Request对象 .meta .text. 方法: .copy() .replace() .
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found