question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't render JavaScript in requests-html / Can't run multithreading in Pyppeteer

See original GitHub issue

Hi,

I’m trying to render JavaScript from webpages, but requests-html fails every time to do it.

This is my code: from requests_html import HTMLSession s = HTMLSession() r = s.get('https://httpbin.org') r.html.render() print(r.html.html)

Some important points to make: -Searching with CTRL+F in the output for the right version that’s displayed when rendering the JavaScript; version 0.9.2 is for non-javascript, while 0.9.3 is for javascript - it always shows 0.9.2 -Searching the keyword “cookie” (it displays “0 matches” even when typing only “cook”) doesn’t show anything because that keyword is displayed when rendering the JavaScript

It prints out the only HTML code before executing the JavaScript. I’ve tried to put a bigger timeout to render: r.html.render(timeout=60)

But it still waits the default 8 seconds.

When trying to put: r.html.render(sleep=60)

It waits for those 60 seconds and then it doesn’t do anything; more than that, it says that the connection’s been lost.

I thought that maybe it didn’t render the JavaScript because it didn’t have any type of headers so I’ve added the Chrome’s ones (I’ve tried with user-agent only & then with all headers displayed in the network tab from Chrome when accessing httpbin.org), but still with no success.

I’ve tried to render the JavaScript with Pyppeteer which is included in the requests-html library and it can render the JavaScript (I don’t understand why since it’s included in the requests-html library); the only downside of this is that I’ve to scrape lots of links, but I couldn’t find a way to run multiple instances of Pyppeteer.

By the way, I’m using PyCharm on Windows 10 with Python 3.6.1 (3.6 throws an error regarding a ‘Deque’ thing that can’t be imported) / 3.7; maybe this info helps in solving the issue.

I’ve tried to be as detailed as possible with the problems I’m facing right now and I hope I can get the solutions I’m looking for.

Thanks in advance!

P.S. Chromium is downloaded and it shows in task manager when running the render() function (same happens when running the Pyppeteer code).

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:4
  • Comments:12

github_iconTop GitHub Comments

5reactions
ryankoltercommented, Mar 12, 2021

My solution:

1.find function browser( ) in requests_html.py

//$python\Lib\site-packages\requests_html.py
async def browser(self):
        if not hasattr(self, "_browser"):
            self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args)
        return self._browser

2.replace headless value

headless=False

3.then, when render() function work, it will open Chromium to render successfully

3reactions
chipswithdripscommented, Nov 7, 2019

So I’m guessing that this project is abandoned.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multithreading with requests-html - python - Stack Overflow
Please be aware that i'm using requests_html and not requests as the pages i'm attempting to scrape are JS rendered. The script runs...
Read more >
requests-html - Bountysource
Hi, I would like to render JavaScript inside a Flask endpoint. The problem is that in a multithreaded environment, the page is not...
Read more >
Parsing JavaScript rendered pages in Python with pyppeteer
To parse those websites, you can't just request HTML from the server. Parsing requires to run some JavaScript. Pyppeteer makes that possible.
Read more >
Multithreading With Requestshtml - ADocLib
The toolbelt provides a simple API for using requests with threading. ... Can't render JavaScript in requestshtml / Can't run multithreading in.
Read more >
Web Scraping with Python: Everything you need to know (2022)
Once your browser received that response, it will parse the HTML code, fetch all embedded assets (JavaScript and CSS files, images, videos), and ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found