question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Got garbled code on some website

See original GitHub issue

I can get data correctly via using scrapy-playwright in my code on most website like this:

def start_requests(self):
    # my code
    # ......

def parse(self, response):
    page = response.meta["playwright_page"]
    response_text = await page.content()
    with open(self.config.task_name+'.html', 'w', encoding='utf-8') as f:
        f.write(response_text)
    # my code
    # ......

But I failed to fetch the right data on another website. (It is actually a rare problem)

I use the code above to save page.content(), it shows like this:

image

So it is completely not readable. I don’t know how to solve this problem.

Here is my server info:

system version: centos 7.6
Scrapy: 2.5.1
scrapy-playwright: 0.0.5
playwright: 1.16.0
...

Looking forward to some solutions 😃

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Alienboypluscommented, May 13, 2022
  1. I assumed the absence of the async keyword in your first example was just a typo, if it wasn’t and you actually did not have it before I don’t know how the code was running in the first place - using await inside a regular function raises SyntaxError: 'await' outside async function.
  2. You’re using scrapy-playwright==0.0.5, I recommend you to update to a more recent version (latest one is 0.0.15), as there have been some fixes related to the encoding of response bodies.
  3. If the problem persists, please report whether or not you get the same results with the standalone playwright script I posted earlier. If you do, you’re having a problem with upstream Playwright and should report it there.

I have just solved this problem, and i did these things:

  1. I reinstalled my server system with centos 7.6, and then try to use my .sh file to set up the environment related to my project.
  2. I checked my .sh file again and found some useless python packages such as selenium_wire. So this time, I didn’t pip install them.
  3. Last year when I try to run my project with scrapy-playwright, an exception showed up and said I need to install some dependencies like at-spi2-atk, libxkbcommon-x11-devel and glibc-2.18. So I added these to my .sh file. This time, I didn’t install those dependencies, and it works correctly (scrapy-playwright==0.0.5 still works)!

So I believe it is just a bug related to the server’s environment. I can get the right data from my target website now. Thanks !

0reactions
elacuestacommented, May 11, 2022
  1. I assumed the absence of the async keyword in your first example was just a typo, if it wasn’t and you actually did not have it before I don’t know how the code was running in the first place - using await inside a regular function raises SyntaxError: 'await' outside async function.
  2. You’re using scrapy-playwright==0.0.5, I recommend you to update to a more recent version (latest one is 0.0.15), as there have been some fixes related to the encoding of response bodies.
  3. If the problem persists, please report whether or not you get the same results with the standalone playwright script I posted earlier. If you do, you’re having a problem with upstream Playwright and should report it there.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Garbled Text on Websites | Firefox Support Forum
If I refresh the first page it remains garbled. I will try to open a garbled page in a new tab, same session,...
Read more >
Encoding settings for garbled text - Google Support
Visit the Chrome Web Store. At the top left, click Extensions. Enter "Garbled text" in the search bar. Choose an extension. Learn how...
Read more >
Garbled text on web pages following windows 10 update
Click on Fonts. 3. Click on Font Settings and click on Restore default font settings.
Read more >
Why do HTML Entities get garbled in View Source?
Examining a Fiddler capture of the actual source code being sent to the browser shows that the browser indeed receives the CORRECT codes....
Read more >
Firefox Garbled Text - encoding - Super User
7 on Vista I am getting garbled text like below on some web pages. screenshot. The obvious answer is to go to View...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found