Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Architecture: Archived JS executes in a context shared with all other archived content (and the admin UI!)

See original GitHub issue

Describe the bug

Hi there! There’s an XSS vulnerability when you open your index.html if you saved a page with a title containing an XSS vector.

Steps to reproduce

Save this page for example: [Twitter of @garethheyes] ](https://twitter.com/garethheyes/status/1126526480614416395)
Open your index.html
Get XSS’d by sir @garethheyes

Source code:

<a href="archive/1557816881/twitter.com/garethheyes/status/1126526480614416395.html" title="\u2028\u2029 op Twitter: "Another way to use throw without a semi-colon:
<script>{onerror=alert}throw 1</script>"">

Software versions

OS: ArchLinux
ArchiveBox version: 903.59da482-1
Python version: python3.7
Chrome version: Chromium 74.0.3729.131 Arch Linux

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:8 (5 by maintainers)

Top GitHub Comments

7reactions

FiloSottilecommented, Jun 21, 2021

I talked about the ArchiveBox scenario with a couple experts, and we came up with a better option than <iframe sandbox>: Content-Security-Policy: sandbox, which instructs the browser to treat the load as its own unique origin.

This is much more robust and convenient than detecting iframe loads.

We also went through the list of security headers to pick the ones that would protect ArchiveBox pages from Spectre, too. They should involve no maintenance.

Content-Security-Policy: sandbox
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Resource-Policy: same-origin [FOR HTML] / cross-origin [FOR NOT HTML]
Vary: Sec-Fetch-Site
X-Content-Type-Options: nosniff

On top of that, it would still be a good idea to have the admin API on a different origin (a different subdomain is enough), and make its cookie SameSite=Strict.

This should stop any cross-contamination between archived pages, but it won’t stop them from detecting other archived pages. That might be possible, but it will require more complex server logic.

4reactions

piratecommented, May 18, 2021

Idea h/t for encouragement from @FiloSottile, and similar to how Wikimedia and many other services do it:

serve all “dirty” archived content from one port, e.g. 9595. including static archive/<timestamp>/index.html indexes, archived content with live JS, etc. that could be dangerous
serve the django admin interface from 9594, with the login screen, ability to add new snapshots, remove URLs, etc. shoudl not be on the same origin as the risky archived content

These can be mapped to separate domains/ports (subdomains are dangerous?maybe, full domains likely required) by the user, but will require adding some new config options to tune what port/domain the admin and dirty content are listening on: e.g. HTTP_DIRTY_LISTEN=https://demousercontent.archivebox.io HTTP_ADMIN_LISTEN=https://demo.archivebox.io

This would close a pretty crucial security hole where archived content can mess with the execution of extractors (and potentially run abitrary shell scripts if they chain together a series of injection attacks).

Extractor methods that replay JS:

wget
singlefile? (TODO check whether it replays JS)

Proposed behavior:

if dirty content is loaded fromw within iframe (with sandbox protections): allow JS, because iframe sandboxes protect us (verify this first)
if dirty content is loaded outside an iframe (e.g. if someone visits the URL directly): serve strict CSP/CORS headers to prevent JS execution entirely
prevent right clicking the iframe to get the unsafe url and open it in a new tab directly ? or detect server side if dirty url is visited outside an iframe and prevent it?

config option to enable bypassing sandboxing:

DANGER_ALLOW_BYPASSING_SANDOX=True/False
once enabled ^ checkbox appears on a per-snapshot basis that allows disabling iframe/csp sandbox protections when replaying that snapshot