question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scrapy capitalizes headers for request

See original GitHub issue

I’m setting the headers following way

headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'cache-control': 'no-cache',
    ...
}

And calling request like that:

yield scrapy.Request(url='https:/myurl.com/', callback=self.parse, headers=headers, cookies=cookies, meta={'proxy': 'http://localhost:8888'})

And it makes that scrapy capitalizes all these headers and it looks like that (I’m using Charles proxy for debugging):

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Cache-Control: no-cache

And this is not working correctly for my case.

If I’m using curl and set headers lowercase

accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
cache-control: no-cache

everything works like a charm.

Is there any way how I can disable this capitalizing behavior in Scrapy? Thanks for any help!

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:1
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

12reactions
kalessincommented, Feb 21, 2018

A fast solution for this issue. In your spider code:

from twisted.web.http_headers import Headers as TwistedHeaders

TwistedHeaders._caseMappings.update({
    b'cache-control': b'cache-control',
    b'accept': b'accept',
})

that prevents twisted to capitalize those headers. The problem is entirely at twisted side. Nothing is required to be done at scrapy side.

2reactions
kmikecommented, Apr 24, 2017

I think it’d be good to not capitalize header names by default, and pass them as-is.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy capitalizes request headers - python - Stack Overflow
I know that some websites do request header fingerprinting to detect bots, but the capitalized headers generated by scrapy look much more ...
Read more >
Requests and Responses — Scrapy 2.7.1 documentation
Create a Request object from a string containing a cURL command. It populates the HTTP method, the URL, the headers, the cookies and...
Read more >
How To Use HEADERS in SCRAPY SHELL, Python Requests ...
See how to add headers in the scrapy shell fetch commandand how to use cURL to check a URL via command line. Make...
Read more >
Scraping Data on the Web with BeautifulSoup
Because Scrapy serves the purpose of mass-scraping, it is much easier to get in trouble with ... import requests from bs4 import BeautifulSoup...
Read more >
064-真正解决scrapy自动将header请求头大写问题 - 家的博客
本文主要讲解如何真正解决scrapy 将header 请求头自动 ... 分别用request 和scrapy 请求目标网站,url,参数,form 等都用一样的数据(排除类似随机 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found