Large scale considerations
See original GitHub issueI would like to open that issue to list what points are important to keep in mind in the development of Quetz
in the perspective of a large scale use.
What I have in mind:
Language or dependencies
- what is the max load that
FastAPI
could handle - Choice of Python as a base language for backend operations (extracting tarballs, generation of json patches, etc.)
- context of providing views depending on the users authorizations, partially handle by database requests
- multi-threading for cpu bound ops
- etc.
Database/storage
- even using
PGSQL
, projections of volumetry and ops/s to be able to handle - do we expect the need of implementing machinery to speedup requests (caching) in others databases? On the filesystem?
- impact of the filesystem, best choice for the read/write operations
- need for distributed filesystems?
Others
- role-based vs attribute-based control?
This is just a draft to be updated with contributions (concerns, solutions, links to pr, etc.)!
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (2 by maintainers)
Top Results From Across the Web
Equipment design considerations for large scale cell culture
This review will provide a high level summary of the documented process difficulties unique to serum-free large scale (LS) cell culture, analyze the...
Read more >5 Key Considerations for Large-scale IoT Deployments
Large -scale IoT deployments come with a very specific set of challenges. Here are five areas that are key to any successful IoT...
Read more >Your Handy Guide to Large-Scale Web Application ...
This article will help you understand what scalability is, explore large-scale web application development steps, considerations, ...
Read more >Large-Scale Amidations in Process Chemistry
Other important factors to be considered on large scale, such as atom economy, cost, safety, and toxicity, are also examined. These concepts ...
Read more >Large Scale Systems Development - Issues and Challenges
Large -scale system development faces various issues and challenges as engineering and management tasks get more difficult at a larger scale.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
BTW we have done some load testing using locust and we can process around 100 rps (requests per second) on a standard laptop using single quetz worker (for the download endpoint which generate a redirect to S3 file).
Thanks for bringing these up! A couple thoughts:
python/fastapi perf: yeah, sure, python isn’t rust or c++. fastapi is down around 250 in the benchmark game, so there are plenty of other things to choose. PyPy could potentially jump it up a hair, though I don’t think all the deps are there yet. but man, I’d sure like a conda package repo that spoke graphql! anyhow, the variant that does best also uses orjson, but who knows, maybe simdjson, or one of the others has even more to say. aside: hadn’t heard of
apidaora
(current leading python framework)… learn some new web junk every day!distributed filesystem: perhaps not what @adriendelsalle had in mind, but ipfs is a very interesting beast, as it theoretically has no single point of failure. I’ve almost got it built for conda-forge, which is cute, but what’s more interesting is it can handle netflix-level volume/velocity. If a community (say conda-forge) can fiat a peer-of-last-resort (seems like 2tb of conda-forge would be ~$100/mo from a pinning service), cloudflare will foot the bill (for now) for CDN, and quetz would be none the wiser when replicating it… or some deeper integration would be possible. an ipfs-native client hardly seems infeasible at this point.
database: this is one of the places where the go-to fastapi/sqlalchemy orm strategy can be a bear. if specifically talking pg, it’s possible to use the binary protocol (even with orm) with asyncpg, which handles a number of issues on the database and app server by doing less work.