Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scrapy capitalizes headers for request

See original GitHub issue

I’m setting the headers following way

headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'cache-control': 'no-cache',
    ...
}

And calling request like that:

yield scrapy.Request(url='https:/myurl.com/', callback=self.parse, headers=headers, cookies=cookies, meta={'proxy': 'http://localhost:8888'})

And it makes that scrapy capitalizes all these headers and it looks like that (I’m using Charles proxy for debugging):

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Cache-Control: no-cache

And this is not working correctly for my case.

If I’m using curl and set headers lowercase

accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
cache-control: no-cache

everything works like a charm.

Is there any way how I can disable this capitalizing behavior in Scrapy? Thanks for any help!

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:9 (6 by maintainers)

Top GitHub Comments

12reactions

kalessincommented, Feb 21, 2018

A fast solution for this issue. In your spider code:

from twisted.web.http_headers import Headers as TwistedHeaders

TwistedHeaders._caseMappings.update({
    b'cache-control': b'cache-control',
    b'accept': b'accept',
})

that prevents twisted to capitalize those headers. The problem is entirely at twisted side. Nothing is required to be done at scrapy side.

2reactions

kmikecommented, Apr 24, 2017

I think it’d be good to not capitalize header names by default, and pass them as-is.

Top Results From Across the Web

Scrapy capitalizes request headers - python - Stack Overflow

I know that some websites do request header fingerprinting to detect bots, but the capitalized headers generated by scrapy look much more ...

Requests and Responses — Scrapy 2.7.1 documentation

Create a Request object from a string containing a cURL command. It populates the HTTP method, the URL, the headers, the cookies and...

How To Use HEADERS in SCRAPY SHELL, Python Requests ...

See how to add headers in the scrapy shell fetch commandand how to use cURL to check a URL via command line. Make...

Scraping Data on the Web with BeautifulSoup

Because Scrapy serves the purpose of mass-scraping, it is much easier to get in trouble with ... import requests from bs4 import BeautifulSoup...

064-真正解决scrapy自动将header请求头大写问题 - 家的博客

本文主要讲解如何真正解决scrapy 将header 请求头自动 ... 分别用request 和scrapy 请求目标网站，url，参数，form 等都用一样的数据（排除类似随机 ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

Implementing case sensitive headers in Scrapy (not through `_caseMappings`)

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Scrapy capitalizes headers for request

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

TLS handshake failure

After adding request flags subclasses of logformatter that rely on 'flags' format string are broken