Community discussion
See original GitHub issueAlthough I’ve not yet done much in the way of documentation or branding this pacakge up, it’s quickly narrowing in on a slice of functionality that’s lacking in the Python HTTP landscape.
Specifically:
- A requests-compatible API.
- With HTTP/2 and HTTP/1.1 support (and an eye on HTTP/3)
- A standard sync client by default, but the option of an async client, if you need it.
- An API for making requests in parallel. The standard case has a regular sync API on the front, but using async under the hood. Also an equivelent for the fully async case. (Both draws on Trio a little for lessons on how to manage branching, exceptions & cancellations.)
- Ability to plug directly into WSGI or ASGI apps, instead of network dispatch. (For usage as a test client, or implementing stub services during test or CI)
- Ability to plug into alternate concurrency backends.
I’ve also some thoughts on allowing the Request and Response models to provide werkzeug-like interfaces, allowing them to be used either client-side or server-side. One of the killer-apps of the new async+HTTP/2 functionality is allowing high-throughput proxy and gateway services to be easily built in Python. Having a “requests”-like package that can also use the models on the server side is something I may want to explore once all the other functionality is sufficiently nailed down.
Since “requests” is an essential & neccessary part of the Python ecosystem, and since this package is aiming to be the next steps on from that, I think it’s worth opening up a bit of community discussion here, even if it’s early days.
I’d originally started out expecting httpcore
to be a silent-partner dependency of any requests3
package, but it progressed fairly quickly from there into “actually I’ve got a good handle on this, I think I need to implement this all the way through”. My biggest questions now are around what’s going to be the most valuable ways to deliver this work to the commnunity.
Ownership, Funding & Maintainence
Given how critical a requests-like HTTP client is to the Python ecosystem as a whole I’d be ammenable to community discussions around ownership & funding options.
I guess that I need to out by documenting & pitching this package in it’s own right, releasing it under the same banner and model at all the other Encode work, and then take things from there if and when it starts to gain any adoption.
I’m open to ideas from the urllib3 or requests teams, if there’s alternatives that need to be explored early on.
Requests
The functionality that this pacakge is homing in on meets the requirements for the proposed “Requests III”. Perhaps there’s something to be explored there, if the requests team is interested, and if we can find a good community-focused arrangement around funding & ownership.
urllib3
The urllib3 team obvs. have a vast stack of real-world usage expertise that’d be important for us to make use of. There’s bits of work that urllib3 does, that httpcore
likely needs to do, including fuzziness around how content decoding actually ends up looking on the the real, messy web. Or, for example, pulling early responses before necessarily having fully sent the outgoing request.
Something else that could well be valuable would be implementing a urllib3 dispatch class alongside the existing h11/h2/async dispatch. Any urllib3 dispatch class would still be built on top of the underlying async structure, but would dispatch the urllib3 calls within a threadpool.
Doing so would allow a couple of useful things, such as being able to isolate behavioral differences between the two implementations, or perhaps allowing a more gradual switchover for critical services that need to take a cautious approach to upgrading to a new HTTP client implementation.
Trio, Curio
I think httpcore as currently delivered makes it fairly easy to deliver a trio-based concurrency backend. It’s unclear to me if supporting that in the package itself is a good balance, or if it would be more maintainable to ensure that the trio team have have the interfaces they need, but that any implementation there would live within their ecosystem.
(I’d probably tend towards the later case there.)
Twisted
I guess that an HTTP/2 client would probably be useful to the Twisted team. I don’t really know enough about Twisted’s style of concurrency API to take a call on if there’s work here that could end up being valuable to them.
HTTP/3
It’ll be worth us keeping an eye on https://github.com/aiortc/aioquic
Having a QUIC implementation isn’t the only thing that we’d need in order to add HTTP/3 support, but it is a really big first step.
We currently have connect/reader/writer interfaces. If we added QUIC support then we’d want our protocol interfaces to additionally support operations like “give me a new stream”, and “set the flow control”, “set the priority level”.
For standard TCP-based HTTP/2 connections, “give me a new stream” would always just return the existing reader/writer pair. For QUIC connections it’d return a new reader/writer pair for a protocol-level stream.
This is getting way ahead of ourselves, but I think we’ve probably got a good basis here to be able to later support HTTP/3.
One big blocker would probably be whatever HTTP-level changes are required between HTTP/2 and HTTP/3 The diffs between QPACK vs HPACK is one cases here, but there’s likely also differences given that the stream framing in HTTP/2 is at the HTTP-level, wheras the stream framing in HTTP/3 is at the transport-level.
It’s unclear to me if these differences are sufficiently incremental that they could fall into the scope of a future hyper/h2
package or not, or what the division of responsibilities would look like.
One important point to draw out here is that the growing complexities from HTTP/1.1, to HTTP/2, to HTTP/3, mean that the Python community is absolutely going to need to need to tackle work in this space as a team effort - the layers in the stack need expertise in various differing areas.
Certificates
Right now we’ve using certifi
for certificate checking. Christian Heimes has been doing some work in this space around accessing interfaces to the Operating System’s certificate store. I might try to collar him at PyLondinium.
Any other feedback?
I’m aware that much of this might look like it’s a bit premature, but the work is pretty progressed, even if I’ve not yet statrted focusing on any branding and documentation around it.
Are there other invested areas of the Python community that I’m not yet considering here?
Where are the urllib3, trio, requests, aiohttp teams heading in their own work in this space? Is there good scope for collaboration, and how do you think that could/should work?
What else am I missing?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:13
- Comments:12 (9 by maintainers)
Top GitHub Comments
I believe beyond technical details we should establish a new PSF Work Group for HTTP to better acquire resources and funding to pay all of you to solve this problem.
There is no reason why we should be unorganized or alone in this. A PSF Work Group would allow us to better leverage fiscal sponsorship, governance, and cross-maintenance of projects.
From a technical perspective, my ideal world would be:
python-https
project using the best from all of our experiences and the resources granted by the work group.python-https
project.CC’ing some folks where I’m not sure if they’ve seen this or not: @pquentin @RatanShreshtha @nateprewitt @shazow @asvetlov @dstufft
@tomchristie: this is super cool, and thanks for starting the conversation.
I’ll start by summarizing what’s happening with the async-urllib3 work and what we’ve been thinking about there, so we can start figuring out how these different initiatives relate.
The async-urllib3 fork
For the last few years, me & @pquentin & @RatanShreshtha have been slowly working on adding async support to urllib3 (also incorporating some older work by @lukasa). The repo and issue tracker is here, and the basic approach is described here: https://github.com/urllib3/urllib3/issues/1323
What we’ve done so far
http.client
has been ripped out and replaced byh11
+ our own networking code.git merge
to pull in their ongoing work (which there’s quite a bit of, to fix all kinds of exotic edge cases: https://github.com/urllib3/urllib3/pulls?q=is%3Apr+is%3Aclosed)urllib3.contrib
, which means: SOCKS, alternative TLS backends (pyopenssl, securetransport), appengine support. It might make sense to port some of these too; we’re not sure.urllib3.contrib
features, our sync API is passing the entire urllib3 test suiteWhat’s left to do
In general, my feeling is that the core HTTP functionality here is really solid. I think I heard @lukasa say once that it’s easy to write 90% of an HTTP client; the last 10% is where all the work is. (I guess this true of everything, but even more so for HTTP.) The async-urllib3 branch doubtless has exciting new bugs we haven’t found yet, but overall this is not a quick proof of concept, it’s a serious attempt at a production library that handles almost all the edge cases I know about, including things that urllib3 has only figured out within the last few months. It even handles early server responses (which is a known problem with classic urllib3, and required multiple iterations to figure out how to make it supportable across multiple networking backends). Though, we do still need to figure out what to do about header casing – https://github.com/python-hyper/h11/issues/31.
There are a bunch of minor things we need to do (e.g. docs, asyncio backend), and also two major ones:
PoolManager
/HTTPConnectionPool
/HTTPSConnectionPool
/etc. These expose a bunch of details that should be handled internally, and the exposed details don’t necessarily make much sense (why does each host get its own session object?). This also makes writing generic tests unnecessarily tiresome, because we need to test all these classes separately, pick the sync/async version as appropriate, etc. Ideally there would just be a single session type that all calls start with, and we could pass in the appropriate version as a test fixture. I think this part of urllib3 is problematic enough that it’s worth reworking in any kind of “v2” project.urllib3 vs async-urllib3 vs httpcore vs requests vs request3 vs idek
OK so that’s what we’ve been working on what the issues we’ve found. What about the larger strategy? First, just to lay out my general assumptions:
If it’s at all possible, our goal should be to converge on a single implementation of the core code for making HTTP requests, that almost everyone uses (either directly or via wrappers like
requests
is currently). HTTP clients have endless edge cases, so the more eyeballs we have on a single library, the more we can all benefit from each other’s experiences. Right now in our urllib3 branch, the Trio-specific code is ~2% of the total library (not counting tests, contrib, . It’s ridiculous that we can’t share the other ~98%.Right now
urllib3
is kinda that, except that it doesn’t handle async, hence the proliferation of async libraries.Unfortunately
urllib3
can’t add async without at least some backcompat breakage, because of all the exposed internals. (The public API exposes that it’s usinghttp.client
under the hood, it has dict-like interfaces that need to become async, etc.) Andurllib3
is stupendously widely used, so our new library to rule them all is going to need a different name, and be parallel-installable to let people migrate gradually.I still have hope that we can switch
requests
over to a new async-capable backend without breaking the world. Therequests
API is much smaller, and if we could pull it off this would (a) save a lot of migration work for people around the world, and (b) make the overall migration go much faster – which in turn means the folk here will get to (eventually) waste less energy on maintaining the old LTS releases of everything. In my perfect world, there’s norequests3
package because we don’t need it.I don’t have a strong opinion on Python 2 support right now. It’s obviously getting less important every day. But the last stragglers are going to be projects like pip and botocore, which need a HTTP client, and would really like to have access to async support. Maybe they’ll be happy with using different clients on py2 and py3 (and in pip’s case, vendoring multiple clients)? I’m assuming
requests
itself will need to support py2 for another year+, and if py2 support is the difference between being able to switch requests vs having to convince everyone to migrate off requests, then that might be enough to make py2 support worth it. I don’t really want to keep caring about py2, but my overriding goal is to minimize the number of HTTP libraries we all have to support, and if py2 makes a difference there I’m willing to hold my nose and do it. …Depending on how hard it is to support py2, which we don’t know yet either.I’m not super interested in ASGI/WSGI integration – it’s a neat feature that people will like, but not my main focus (and Trio will have the ability to mock out the network itself for testing, so you don’t necessarily need this kind of support inside individual libraries). I do wonder how you’ll provide an async API to WSGI apps or a sync API to ASGI apps, though?
I think talking about HTTP/2 is kinda premature, honestly. I looked at
httpcore/dispatch/http2.py
, and AFAICT it doesn’t support outgoing flow control or PING handling (both of which are protocol violations), and it doesn’t support multiplexing (which pretty much makes HTTP/2 support useless). And fixing these will require some substantial architectural changes, because they require background tasks and shared state across multiple connections. Which in turn will make it significantly more complicated to support multiple concurrency backends, and means you need to somehow disable HTTP/2 entirely when running in sync mode… it’s a lot of extra complexity. I think we should be strategizing on the shortest path to something shippable, and HTTP/2 is not on the critical path for that. We definitely want to get there eventually, and we need to keep an eye on it to make sure we don’t do anything that rules it out, but we don’t want to get people excited about something that we can’t deliver yet…(BTW, we might also want to think about websocket client support eventually too – with HTTP/2 you can have HTTP and WS traffic over a single connection.)
Anyway. Looking at httpcore, my overall impression is … surprisingly complementary to the async-urllib3 work? The async-urllib3 stuff is really strong on low-level protocol stuff, but the public API has a decade of accumulated cruft. httpcore feels like it’s a few years away from handling all the gnarly edge cases, but the overall API and structure seem way more thought-through. I wonder if there’s any way to combine forces on that basis?