question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] How to reuse existing browser context?

See original GitHub issue

Long into my program, Scrapy is starting at a particular point. I’ve already launched a browser, and logged into an application. I would like to reuse the existing browser context and crawl through the application.

I can pass:

  • browser instance: <Browser type=<BrowserType name=chromium executable_path=/.../chrome> version=104.0.5112.20>
  • it’s active context: <BrowserContext browser=<Browser type=<BrowserType name=chromium executable_path=/.../chrome> version=104.0.5112.20>>
  • and the even Page object: <Page url='https://demo.testfire.net/index.jsp'>

to scrapy-playwright configuration.

meta = {
    'dont_merge_cookies': True,
    'handle_httpstatus_list': [404, 302],
    'playwright': system().lower() in {'linux', 'darwin'},
    'playwright_context': # ???
}

But looking at #supported-settings & #browser-contexts sections in the README, I’m not sure how to put it as a dictionary precisely.

Please help. Thanks!

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
elacuestacommented, Jul 27, 2022

Glad to hear, thanks for the update.

1reaction
joe733commented, Jul 27, 2022

Hi, yes. I resolved it yesterday evening. Here’s what I did:

  • Cleaned up (persistent) browser cache before and after each program run, which was causing loading issues.
  • Got rid of manual page object management flags, and let the scrapy-playwright library handle it.
  • Used the “default” browser context and fed storage_state (had a typo earlier) to it.
  • Added HTTP 'referer' as the response.url for request continuity.

For anyone who might stumble upon this. Updated code:

self.rq_meta = {
    'dont_merge_cookies': True,
    'handle_httpstatus_list': [302, 403, 404],
    'playwright': system().lower() in {'linux', 'darwin'},
    'playwright_context_kwargs': {
        'storage_state': kw.get('file_store'),
        'ignore_https_errors': True,
    },
}
scrapy_ps.crawl(
    LarvaeSpider,
    allowed_domains=[n_loc, ],
    start_urls=[s_url, ],
    user_agent=header_['User-Agent'],
    url_dump_path=f'dump/{f_pth}.txt',
    file_store=brw_cxt_store,
)

Key takeaway: Diligently read the docs!

Feel free to (re)close this a complete. Thanks for the help!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to reuse existing browser session, instead of opening ...
In short, each command like webelement.click() is mapped to some REST endpoint like /session/{sessionID}/element/{elementID}/click with a http ...
Read more >
How to reuse existing Selenium browser session - Qxf2 BLOG
This post shows you how to do just that! This would help you to debug the locators used quickly, instead of running your...
Read more >
How to reuse a selenium browser session - Stack Overflow
from selenium import webdriver def ; main(): """ reuse window in different scripts """ driver = webdriver.Chrome() executor_url = driver.
Read more >
[Question] Attaching playwright to an existing browser window?
I have a browser I launched with ChromeDriver (the Testim editor) and I want it to run a test authored in playwright in...
Read more >
Reuse your existing assets and best practices - YOOI
Enable easy discovery and reuse of existing assets, provide access to usage context to build ... Questions and challenges people face in each...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found