Got garbled code on some website
See original GitHub issueI can get data correctly via using scrapy-playwright in my code on most website like this:
def start_requests(self):
# my code
# ......
def parse(self, response):
page = response.meta["playwright_page"]
response_text = await page.content()
with open(self.config.task_name+'.html', 'w', encoding='utf-8') as f:
f.write(response_text)
# my code
# ......
But I failed to fetch the right data on another website. (It is actually a rare problem)
I use the code above to save page.content()
, it shows like this:
So it is completely not readable. I don’t know how to solve this problem.
Here is my server info:
system version: centos 7.6
Scrapy: 2.5.1
scrapy-playwright: 0.0.5
playwright: 1.16.0
...
Looking forward to some solutions 😃
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Garbled Text on Websites | Firefox Support Forum
If I refresh the first page it remains garbled. I will try to open a garbled page in a new tab, same session,...
Read more >Encoding settings for garbled text - Google Support
Visit the Chrome Web Store. At the top left, click Extensions. Enter "Garbled text" in the search bar. Choose an extension. Learn how...
Read more >Garbled text on web pages following windows 10 update
Click on Fonts. 3. Click on Font Settings and click on Restore default font settings.
Read more >Why do HTML Entities get garbled in View Source?
Examining a Fiddler capture of the actual source code being sent to the browser shows that the browser indeed receives the CORRECT codes....
Read more >Firefox Garbled Text - encoding - Super User
7 on Vista I am getting garbled text like below on some web pages. screenshot. The obvious answer is to go to View...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I have just solved this problem, and i did these things:
centos 7.6
, and then try to use my.sh
file to set up the environment related to my project..sh
file again and found some useless python packages such asselenium_wire
. So this time, I didn’tpip install
them.at-spi2-atk
,libxkbcommon-x11-devel
andglibc-2.18
. So I added these to my.sh
file. This time, I didn’t install those dependencies, and it works correctly (scrapy-playwright==0.0.5
still works)!So I believe it is just a bug related to the server’s environment. I can get the right data from my target website now. Thanks !
async
keyword in your first example was just a typo, if it wasn’t and you actually did not have it before I don’t know how the code was running in the first place - usingawait
inside a regular function raisesSyntaxError: 'await' outside async function
.scrapy-playwright==0.0.5
, I recommend you to update to a more recent version (latest one is0.0.15
), as there have been some fixes related to the encoding of response bodies.playwright
script I posted earlier. If you do, you’re having a problem with upstream Playwright and should report it there.