question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Header transfer-encoding make Splash API return 504 Gateway Timeout

See original GitHub issue

I was developing a crawler using Splash when suddenly i started to receive a lot of gateway timeouts. Trying to troubleshooting the problem, i discover the cause of this is header transfer-encoding: chunked, i made a PoC (the url httpbin.org/headers returns the same headers i sent on request):

import requests
import json

ENDPOINT_SPLASH = 'http://localhost:8050/execute'


def test_with_custom_headers():
    lua_script = """
    function main(splash, args)
     splash:set_custom_headers({
       ["x-custom-header"] = "splash"
     })
     assert(splash:go(args.url))
     assert(splash:wait(0.5))
     return {
       html = splash:html()
     }
    end
    """

    payload = {
        'lua_source': lua_script,
        'url': 'https://httpbin.org/headers',
        'timeout': 15,
    }

    r = requests.post(url=ENDPOINT_SPLASH,
                      json=payload)

    result = json.loads(r.text)

    return result.get('html', result)


def test_with_content_encoding():
    lua_script = """
    function main(splash, args)
     splash:set_custom_headers({
       ["transfer-encoding"] = "chunked"
     })
     assert(splash:go(args.url))
     assert(splash:wait(0.5))
     return {
       html = splash:html()
     }
    end
    """

    payload = {
        'lua_source': lua_script,
        'url': 'https://httpbin.org/headers',
        'timeout': 15,
    }

    r = requests.post(url=ENDPOINT_SPLASH,
                      json=payload)

    result = json.loads(r.text)

    return result.get('html', result)


print("test_with_custom_headers: \n{}\n".format(test_with_custom_headers()))
print("test_with_content_encoding: \n{}".format(test_with_content_encoding()))

Results:

test_with_custom_headers: 
<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "en,*", 
    "Host": "httpbin.org", 
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) splash Version/9.0 Safari/602.1", 
    "X-Custom-Header": "splash"
  }
}
</pre></body></html>

test_with_content_encoding: 
{'info': {'timeout': 15.0}, 'type': 'GlobalTimeoutError', 'error': 504, 'description': 'Timeout exceeded rendering page'}

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
Granitosauruscommented, Nov 19, 2019

I’m having the same issue but weirdly enough only when using proxies via splash:on_request. My splash is patched with decompression patch described in this issue: https://github.com/scrapinghub/splash/issues/324 if you aren’t using proxies this might solve the issue for you.

0reactions
bpgallaghercommented, Jan 2, 2021

I’m having the same issue. Is there a solution for this problem yet? Thank you.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting 504 Gateway Time-out while running ...
Here the script to proof that (url httpbin.org/headers) returns the same headers ... args) splash:set_custom_headers({ ["transfer-encoding"] ...
Read more >
Troubleshoot API Gateway HTTP 504 timeout errors
To troubleshoot 504 timeout errors from API Gateway, first identify and verify the source of the error in your Amazon CloudWatch execution logs....
Read more >
504 Gateway Timeout Error: What It Is and How to Fix It
A 504 Gateway Timeout Error is an HTTP response status code indicating that a server currently acting as a gateway or proxy did...
Read more >
[Solved]-Getting 504 Gateway Time-out while running ...
Coding example for the question Getting 504 Gateway Time-out while ... This is because the url that you want to scrapy returns transfer-encoding...
Read more >
Gateway Timeout when using different port mapping
Even when setting it to port 80 I get a gateway timeout. I changed my config to reflect this. ... [root@docker-core pi-hole]# curl...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found