question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Header order is lost when passing a session to create_scraper

See original GitHub issue

The headers copied from a normal requests Session instance are not ordered aka isinstance(session.headers, OrderedDict) is False. If that session is passed to cfscrape.create_scraper(sess=requests.Session()) the scraper returned will not have it’s headers attribute defined properly since it’s overridden.

https://github.com/Anorov/cloudflare-scrape/blob/2ffeb22b78d64b2b8007a8521ac276b13c0ac306/cfscrape/__init__.py#L336-L358

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ghostcommented, May 16, 2019

I was thinking something like this:

Code snippet
    @classmethod
    def create_scraper(cls, sess=None, **kwargs):
        """
        Convenience function for creating a ready-to-go CloudflareScraper object.
        """

        if sess:
            if hasattr(sess, 'headers'):
                # Skip this if headers == requests default headers
                kwargs.setdefault('headers', sess.headers)

        scraper = cls(**kwargs)

        if sess:
            exclude = ('headers',)

            attrs = (x for x in scraper.__attrs__ if not x in exclude)
            for x in attrs:
                if hasattr(sess, x):
                    setattr(scraper, x, getattr(sess, x))

        return scraper

@Anorov @lukele What do you think?

0reactions
ghostcommented, May 18, 2019

I’ve made up my mind on how I think this should be done. The headers should be merged in order to retain order and the sess argument should be incompatible with other arguments. If other arguments such as header/cookies/params/data are allowed with the sess argument then they should be merged.

There is a helper function to aid in merging those attributes: https://github.com/kennethreitz/requests/blob/a79a63390bc963e5924021086744e53585566307/requests/sessions.py#L49-L77

But at least cookies would require special handling. I’m voting to disallow extra argument with the sess argument. Whether or not to change the current behavior to make use of Session.__attrs__ is a completely different issue. I don’t plan on including that in a PR.

Code snippet
from requests.utils import default_headers as requests_headers

    @classmethod
    def create_scraper(cls, sess=None, **kwargs):
        """
        Convenience function for creating a ready-to-go CloudflareScraper object.
        """

        if not sess:
            return cls(**kwargs)

        if len(kwargs) > 1:
            raise ValueError('Passing arguments with "sess" isn\'t currently supported')

        scraper = cls()

        headers = getattr(sess, 'headers', None):
        if headers and headers != requests_headers():
            scraper.headers.update(headers)

        exclude = ('headers',)
        attrs = (x for x in scraper.__attrs__ if not x in exclude)
        for x in attrs:
            if hasattr(sess, x):
               setattr(scraper, x, getattr(sess, x))

        return scraper

I think we can release this with a minor bump since passing sess with keyword arguments doesn’t currently make sense and didn’t really work for 99% of use cases anyway.

Alternatively, we can simply keep everything the same and only address the headers which might be the best possible option as of right now.

@Anorov @lukele

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Having trouble maintaining order of Session headers ...
Now what happens if we pass some headers to it in some format we want? import requests headers = { "accept": "text/html ...
Read more >
Advanced Usage — Requests 2.28.1 documentation
Any dictionaries that you pass to a request method will be merged with the session-level values that are set. The method-level parameters override...
Read more >
Sessions apparently corrupting header values on subsequent ...
On the first case, they match the (valid) API key I wrote on the test code. In the failed request, the last few...
Read more >
Request Headers for Web Scraping - YouTube
With every HTTP request there are headers that contain information about that request. We can maipulate these with requests or which ever ...
Read more >
cloudscraper - PyPI
Prints out header and content information of the request for debugging. ... If you already have an existing Requests session, you can pass...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found