🎓 Graduate to the default version of HTTP Archive
See original GitHub issueRoadmap
The goal is for beta.httparchive.org to become the new httparchive.org, targeting some time in Q1 2018. The following tasks are launch blocking:
- automate the report generation after new tables are added to BigQuery
- resolve all placeholder graphics (eg report images)
- write descriptions for all reports/metrics
- rewrite FAQ section
- fix UX issues with data visualization
- implement a redirect solution for legacy URLs
- redirect to HTTPS
- reorganize GitHub repositories
Report automation
The reports are manually generated with scripts. These scripts should be automated to run whenever a crawl is complete. Some subtasks for this issue:
- Queries may depend on convenience datasets (eg
httparchive.lighthouse) which are manually copied from catch-all datasets (eghttparchive.har). New tables in these datasets must all be created automatically. - Use pubsub or similar tools to trigger report jobs after the dataflow job has completed successfully.
Graphics
Concept graphics for the JS report:

We would need similar graphics for each report. It would also be nice to have a default graphic for new reports until a permanent one could be made.
Written descriptions
For example, “Total KB” should have a description like The sum of transfer size kilobytes of all resources requested by the page..
Reports should describe their contents and maybe even a brief analysis of the overall trends.
Rewrite FAQs
The legacy FAQs are somewhat outdated. The new FAQ page should contain updated information including any new content related to the new reports/metrics/visualizations.
Data viz UX
Some feedback on the charts include:
- what the heck is a CDF/PDF?
- make the tooltips more descriptive
- unclear what the outlier bin is in the histograms
- not obvious how to switch between timeseries/histogram modes or that separate modes exist
- collapse desktop and mobile tables into one with both histograms side by side
Legacy redirects
Not all legacy features will be supported by the beta site at launch. Since the beta site will assume the root domain, it will start receiving requests from legacy URLs. At launch, the legacy site will still be accessible at http://legacy.httparchive.org. Known legacy URLs should be redirected to this subdomain unless the feature is also available on the beta site, in which case the URL should be mapped. Whether to use a temporary or permanent (301 vs 302) redirect depends on whether the feature is expected to be supported by the beta site.
For example, one simple case is the About page, which is http://httparchive.org/about.php. This has a corresponding page on the beta site at https://beta.httparchive.org/about. A more complicated example is http://httparchive.org/viewsite.php?pageid=84263714 which may be supported in the future.
HTTPS Redirects
Any HTTP request should automatically redirect to HTTPS. This is a simple feature but had a subtle infinite redirect bug when I last attempted a fix in the Flask layer.
Related: ensure the Let’s Encrypt certificate automatically renews. Same for the https://cdn.httparchive.org certificate.
Reorg GitHub
This code base should become the canonical “httparchive” project on GitHub. The legacy code base should be renamed to “legacy.httparchive.org” or similar. Consider the careful dance of moving code around in such a way that the primary project maintains the same stars/watchers. However this may screw up the commit history.
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (6 by maintainers)

Top Related StackOverflow Question
It’s happening!
The roadmap is 100% complete and we’ll begin the graduation ceremony tomorrow. See the doc for more info about the rollout.
Renamed GitHub repositories.
Old name: beta.httparchive.org New name: httparchive.org
Old name: httparchive New name: legacy.httparchive.org
I was unable to rename beta to
httparchivebecause GitHub keeps it around to 301 redirect old URLs. Stars/forks/watchers are unable to be carried over to the new repository, but this is ok and we can more easily track the actual growth/usage/popularity of the new repo.