Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large scale considerations

See original GitHub issue

I would like to open that issue to list what points are important to keep in mind in the development of Quetz in the perspective of a large scale use.

What I have in mind:

Language or dependencies

what is the max load that FastAPI could handle
Choice of Python as a base language for backend operations (extracting tarballs, generation of json patches, etc.)
- context of providing views depending on the users authorizations, partially handle by database requests
- multi-threading for cpu bound ops
- etc.

Database/storage

even using PGSQL, projections of volumetry and ops/s to be able to handle
do we expect the need of implementing machinery to speedup requests (caching) in others databases? On the filesystem?
impact of the filesystem, best choice for the read/write operations
need for distributed filesystems?

Others

role-based vs attribute-based control?

This is just a draft to be updated with contributions (concerns, solutions, links to pr, etc.)!

Issue Analytics

State:
Created 3 years ago
Comments:10 (2 by maintainers)

Top GitHub Comments

2reactions

btelcommented, Feb 11, 2021

BTW we have done some load testing using locust and we can process around 100 rps (requests per second) on a standard laptop using single quetz worker (for the download endpoint which generate a redirect to S3 file).

locust_quetz

2reactions

bollwyvlcommented, Feb 11, 2021

Thanks for bringing these up! A couple thoughts:

python/fastapi perf: yeah, sure, python isn’t rust or c++. fastapi is down around 250 in the benchmark game, so there are plenty of other things to choose. PyPy could potentially jump it up a hair, though I don’t think all the deps are there yet. but man, I’d sure like a conda package repo that spoke graphql! anyhow, the variant that does best also uses orjson, but who knows, maybe simdjson, or one of the others has even more to say. aside: hadn’t heard of apidaora (current leading python framework)… learn some new web junk every day!

distributed filesystem: perhaps not what @adriendelsalle had in mind, but ipfs is a very interesting beast, as it theoretically has no single point of failure. I’ve almost got it built for conda-forge, which is cute, but what’s more interesting is it can handle netflix-level volume/velocity. If a community (say conda-forge) can fiat a peer-of-last-resort (seems like 2tb of conda-forge would be ~$100/mo from a pinning service), cloudflare will foot the bill (for now) for CDN, and quetz would be none the wiser when replicating it… or some deeper integration would be possible. an ipfs-native client hardly seems infeasible at this point.

database: this is one of the places where the go-to fastapi/sqlalchemy orm strategy can be a bear. if specifically talking pg, it’s possible to use the binary protocol (even with orm) with asyncpg, which handles a number of issues on the database and app server by doing less work.

Top Results From Across the Web

Equipment design considerations for large scale cell culture

This review will provide a high level summary of the documented process difficulties unique to serum-free large scale (LS) cell culture, analyze the...

5 Key Considerations for Large-scale IoT Deployments

Large -scale IoT deployments come with a very specific set of challenges. Here are five areas that are key to any successful IoT...

Your Handy Guide to Large-Scale Web Application ...

This article will help you understand what scalability is, explore large-scale web application development steps, considerations, ...

Large-Scale Amidations in Process Chemistry

Other important factors to be considered on large scale, such as atom economy, cost, safety, and toxicity, are also examined. These concepts ...

Large Scale Systems Development - Issues and Challenges

Large -scale system development faces various issues and challenges as engineering and management tasks get more difficult at a larger scale.