question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug]: Unreliable page closing in the presence of a prompt

See original GitHub issue

Playwright version

1.17.0

Operating system

Linux

What browsers are you seeing the problem on?

Chromium

Other information

OS: Pop!_OS 21.04 Python: 3.9.7

What happened? / Describe the bug

Hi,

I am trying to use the Chromium browser with each tab controlled asynchronously (I simplified the code as much as possible, my project is a fully async crawler with URLs feed from a PostgreSQL queue using the new sqlalchemy async engines).

I noticed that if I land on a page with a prompt (user/password) sometimes it will not get closed.

Code snippet to reproduce your bug

# # Description
#
# This shows a potential problem with the playwright library with tabs remaining forever despite closing them.
# This seems to happen somewhat randomly and only in the presence of prompts.
import asyncio
import sys
import threading
import traceback
from asyncio import TimeoutError as AsyncTimeoutError
from playwright.async_api import async_playwright

# # Config

MAXIMUM_CONCURRENCY = 7 # urls in parallel
TIMEOUT_URL = 10 # seconds
NB_REPEATS = 5 # we will call the same URLs multiple times as errors are a bit random
HEADLESS = False

# # Data

urls = ['https://en-master.skoda-auto.com/models/scala',
        'https://cs-cz-260-22222.cpv3prod.skoda-auto.com/prodej-novych-vozu/modely',
        'https://en-master-v2.skoda-auto.com/_doc/cfb6fbbd-3a21-4cd6-a52d-e81cbb7f5fa9',
        'https://en-master-v2.skoda-auto.com/_doc/eea2f7ce-ee89-42cf-959c-21fe73a133ac',
        'https://en-master.skoda-auto.com/company/myskoda-iv-app',
        'https://en-master.skoda-auto.com/models/fabia/fabia-combi-scoutline/fabia-combi-scoutline']*NB_REPEATS

# # Helpers and workers

semaphor = asyncio.Semaphore(MAXIMUM_CONCURRENCY) # limit concurrency

async def get_browser_context():
    p = await async_playwright().start()
    playwright_browser = await p.chromium.launch(headless=HEADLESS)
    return await playwright_browser.new_context()

async def feeder(queue):
    """
    Fetches URLs from some source and puts them into a queue for our other workers
    """
    for url in urls:
        queue.put_nowait(url) # put an item into the queue without blocking
        await asyncio.sleep(0.05) # don't spam the queue

async def downloader(queue, context):
    """
    Fetches the content of a webpage
    """
    while True:
        # no more jobs? try again later
        if queue.empty():
            await asyncio.sleep(0.05)
            continue

        # set page to a default in case we cannot create one for some reason
        page = None 
        try:
            await semaphor.acquire()
            url = await queue.get()
            loop = asyncio.get_event_loop()
            page = await context.new_page()
            # use timeout in playwright and then another longer timeout for the coroutine
            # in case something gets stuck so we are sure nothing keeps running forever
            # and pages are always closed
            coro = page.goto(url, timeout=TIMEOUT_URL*1000)
            result = await asyncio.wait_for(coro, timeout=20)
            content = await response.body()
        except Exception:
            traceback.print_exc()
        finally:
            if page is not None:
                await page.close()
            semaphor.release()
            queue.task_done()

# # Main

async def main():
    loop = asyncio.get_event_loop()
    queue = asyncio.PriorityQueue()
    context = await get_browser_context()
    await asyncio.gather(feeder(queue),
                         asyncio.gather(*[downloader(queue=queue, context=context) for i in range(MAXIMUM_CONCURRENCY)]),
                         return_exceptions=False)

# # Run

if __name__ == '__main__':
    asyncio.run(main(), debug=True)

Relevant log output

No response

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
mxschmittcommented, Jan 26, 2022

This should be fixed in master by https://github.com/microsoft/playwright/pull/11614.

1reaction
mxschmittcommented, Nov 25, 2021

We definitely won’t close it. Mostly we work on the demand of an issue, which means we count upvotes. Also in this particular case, it’s an upstream Chromium bug, which usually takes more time for us to invest compared to Playwright bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is Defect/Bug Life Cycle in Software Testing? Defect Life ...
#9) Closed: When the defect does not exist any longer, then the tester changes the status of the defect to “Closed”. A Few...
Read more >
Fixes or workarounds for recent issues in Outlook for PC
For issues that are not covered on this page, we have multiple tools to help you automatically diagnose and fix a range of...
Read more >
HTTP Status Codes, Network and DNS Errors, and Google ...
This page describes how different HTTP status codes, network errors, and DNS errors affect Google Search. We cover the top 20 status codes...
Read more >
Chrome Enterprise and Education release notes
Chrome now provides a custom default error page when Progressive Web Apps ... and once the interaction is complete, the desk closes down...
Read more >
Random Error - SPH - Boston University
There might be systematic error, such as biases or confounding, that could make the estimates inaccurate. However, even if we were to minimize...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found