504 Gateway Time-out
See original GitHub issueHello, I am crawling a website with 10K contents, when I crawl first it’s all response 200, everything is ok, but after few minutes 504 Gateway Time-out appears and after 3 times retrying scrapy give up retrying. I set :
'CONCURRENT_REQUESTS':10,
'HTTPCACHE_ENABLED':True,
'DOWNLOAD_DELAY':5,
'CONCURRENT_REQUESTS_PER_IP':10,
and endpoint is render.html
'splash' : {
'endpoint' : 'render.html',
'args' : {'wait':1},
}
I am using : *scrapy version: 1.0.3 *python:2.7 *docker server
How can I optimize my crawler ? and avoid 504 error?
Issue Analytics
- State:
- Created 8 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
504 Gateway Timeout - HTTP - MDN Web Docs
The HyperText Transfer Protocol (HTTP) 504 Gateway Timeout server error response code indicates that the server, while acting as a gateway ...
Read more >The Quick & Easy Guide to Fixing 504 Gateway Timeout Errors
A 504 Gateway Timeout Error means your web server didn't receive a timely response from another server upstream when it attempted to load...
Read more >What is a 504 Gateway Timeout error, and how to fix it?
The 504 (Gateway Timeout) status code indicates that the server while acting as a gateway or proxy, did not receive a timely response...
Read more >How to Fix 504 Gateway Timeout Error: 10 Reliable Solutions
The 504 “Gateway Timeout” Error indicates that the browser sent an HTTP request to the server and it did not receive a response...
Read more >How to Fix the 504 Gateway Timeout Error on Your Site - Kinsta
A 504 Gateway Timeout error indicates that the web server is waiting too long to respond from another server and “timing out.” There...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Follow @omkaaa, I change the args to:
It works!
Besides, some website would very quick when you using
curl
or Browser, but very slow in splash, because splash cannot download some resources currectly.These can also come across with
504 Gateway Time-out
. The right way is stop the slow resource download. in Splash, you can setresource_timeout
in args:Hey @omkaaa,
Please check http://splash.readthedocs.org/en/stable/faq.html - does it help?