question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Architecture: Archived JS executes in a context shared with all other archived content (and the admin UI!)

See original GitHub issue

Describe the bug

Hi there! There’s an XSS vulnerability when you open your index.html if you saved a page with a title containing an XSS vector.

Steps to reproduce

  1. Save this page for example: [Twitter of @garethheyes] ](https://twitter.com/garethheyes/status/1126526480614416395)
  2. Open your index.html
  3. Get XSS’d by sir @garethheyes

Source code:

<a href="archive/1557816881/twitter.com/garethheyes/status/1126526480614416395.html" title="\u2028\u2029 op Twitter: "Another way to use throw without a semi-colon:
<script>{onerror=alert}throw 1</script>"">

Software versions

  • OS: ArchLinux
  • ArchiveBox version: 903.59da482-1
  • Python version: python3.7
  • Chrome version: Chromium 74.0.3729.131 Arch Linux

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

7reactions
FiloSottilecommented, Jun 21, 2021

I talked about the ArchiveBox scenario with a couple experts, and we came up with a better option than <iframe sandbox>: Content-Security-Policy: sandbox, which instructs the browser to treat the load as its own unique origin.

This is much more robust and convenient than detecting iframe loads.

We also went through the list of security headers to pick the ones that would protect ArchiveBox pages from Spectre, too. They should involve no maintenance.

Content-Security-Policy: sandbox
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Resource-Policy: same-origin [FOR HTML] / cross-origin [FOR NOT HTML]
Vary: Sec-Fetch-Site
X-Content-Type-Options: nosniff

On top of that, it would still be a good idea to have the admin API on a different origin (a different subdomain is enough), and make its cookie SameSite=Strict.

This should stop any cross-contamination between archived pages, but it won’t stop them from detecting other archived pages. That might be possible, but it will require more complex server logic.

4reactions
piratecommented, May 18, 2021

Idea h/t for encouragement from @FiloSottile, and similar to how Wikimedia and many other services do it:

  • serve all “dirty” archived content from one port, e.g. 9595. including static archive/<timestamp>/index.html indexes, archived content with live JS, etc. that could be dangerous
  • serve the django admin interface from 9594, with the login screen, ability to add new snapshots, remove URLs, etc. shoudl not be on the same origin as the risky archived content

These can be mapped to separate domains/ports (subdomains are dangerous?maybe, full domains likely required) by the user, but will require adding some new config options to tune what port/domain the admin and dirty content are listening on: e.g. HTTP_DIRTY_LISTEN=https://demousercontent.archivebox.io HTTP_ADMIN_LISTEN=https://demo.archivebox.io

This would close a pretty crucial security hole where archived content can mess with the execution of extractors (and potentially run abitrary shell scripts if they chain together a series of injection attacks).

Semi-Related, using sandbox iframes for replay: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Sec-Fetch-Mode

Extractor methods that replay JS:

  • wget
  • singlefile? (TODO check whether it replays JS)

Proposed behavior:

  • if dirty content is loaded fromw within iframe (with sandbox protections): allow JS, because iframe sandboxes protect us (verify this first)
  • if dirty content is loaded outside an iframe (e.g. if someone visits the URL directly): serve strict CSP/CORS headers to prevent JS execution entirely
  • prevent right clicking the iframe to get the unsafe url and open it in a new tab directly ? or detect server side if dirty url is visited outside an iframe and prevent it?

config option to enable bypassing sandboxing:

  • DANGER_ALLOW_BYPASSING_SANDOX=True/False
  • once enabled ^ checkbox appears on a per-snapshot basis that allows disabling iframe/csp sandbox protections when replaying that snapshot
Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating and sharing Lambda layers - AWS Documentation
Create a Lambda layer to share code in your organization or publicly. Layers can contain libraries, a custom runtime, or other dependencies.
Read more >
How to Archive a Website: Our Mammoth Guide to Saving ...
Backups are important, but so is site archiving. This post will show you how to archive a website quickly and efficiently.
Read more >
Jobs artifacts administration - GitLab Docs
An artifact is a list of files and directories attached to a job after it finishes. This feature is enabled by default in...
Read more >
WKWebView | Apple Developer Documentation
An object that displays interactive web content, such as for an in-app browser. ... interface elements, such as contextual menus or panels, into...
Read more >
Wayback Machine - Wikipedia
The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive, a nonprofit based in San Francisco,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found