question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large scale considerations

See original GitHub issue

I would like to open that issue to list what points are important to keep in mind in the development of Quetz in the perspective of a large scale use.

What I have in mind:

Language or dependencies

  • what is the max load that FastAPI could handle
  • Choice of Python as a base language for backend operations (extracting tarballs, generation of json patches, etc.)
    • context of providing views depending on the users authorizations, partially handle by database requests
    • multi-threading for cpu bound ops
    • etc.

Database/storage

  • even using PGSQL, projections of volumetry and ops/s to be able to handle
  • do we expect the need of implementing machinery to speedup requests (caching) in others databases? On the filesystem?
  • impact of the filesystem, best choice for the read/write operations
  • need for distributed filesystems?

Others

  • role-based vs attribute-based control?

This is just a draft to be updated with contributions (concerns, solutions, links to pr, etc.)!

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
btelcommented, Feb 11, 2021

BTW we have done some load testing using locust and we can process around 100 rps (requests per second) on a standard laptop using single quetz worker (for the download endpoint which generate a redirect to S3 file).

locust_quetz

2reactions
bollwyvlcommented, Feb 11, 2021

Thanks for bringing these up! A couple thoughts:

python/fastapi perf: yeah, sure, python isn’t rust or c++. fastapi is down around 250 in the benchmark game, so there are plenty of other things to choose. PyPy could potentially jump it up a hair, though I don’t think all the deps are there yet. but man, I’d sure like a conda package repo that spoke graphql! anyhow, the variant that does best also uses orjson, but who knows, maybe simdjson, or one of the others has even more to say. aside: hadn’t heard of apidaora (current leading python framework)… learn some new web junk every day!

distributed filesystem: perhaps not what @adriendelsalle had in mind, but ipfs is a very interesting beast, as it theoretically has no single point of failure. I’ve almost got it built for conda-forge, which is cute, but what’s more interesting is it can handle netflix-level volume/velocity. If a community (say conda-forge) can fiat a peer-of-last-resort (seems like 2tb of conda-forge would be ~$100/mo from a pinning service), cloudflare will foot the bill (for now) for CDN, and quetz would be none the wiser when replicating it… or some deeper integration would be possible. an ipfs-native client hardly seems infeasible at this point.

database: this is one of the places where the go-to fastapi/sqlalchemy orm strategy can be a bear. if specifically talking pg, it’s possible to use the binary protocol (even with orm) with asyncpg, which handles a number of issues on the database and app server by doing less work.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Equipment design considerations for large scale cell culture
This review will provide a high level summary of the documented process difficulties unique to serum-free large scale (LS) cell culture, analyze the...
Read more >
5 Key Considerations for Large-scale IoT Deployments
Large -scale IoT deployments come with a very specific set of challenges. Here are five areas that are key to any successful IoT...
Read more >
Your Handy Guide to Large-Scale Web Application ...
This article will help you understand what scalability is, explore large-scale web application development steps, considerations, ...
Read more >
Large-Scale Amidations in Process Chemistry
Other important factors to be considered on large scale, such as atom economy, cost, safety, and toxicity, are also examined. These concepts ...
Read more >
Large Scale Systems Development - Issues and Challenges
Large -scale system development faces various issues and challenges as engineering and management tasks get more difficult at a larger scale.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found