Proposal: AsyncClient API unification
See original GitHub issueTo fully support HTTP/2, httpx will want to support multiplexing, arguably the most important HTTP/2 feature:
For both SPDY and HTTP/2 the killer feature is arbitrary multiplexing on a single well congestion controlled channel. It amazes me how important this is and how well it works. One great metric around that which I enjoy is the fraction of connections created that carry just a single HTTP transaction (and thus make that transaction bear all the overhead). For HTTP/1 74% of our active connections carry just a single transaction - persistent connections just aren’t as helpful as we all want. But in HTTP/2 that number plummets to 25%. That’s a huge win for overhead reduction. Let’s build the web around that. (HTTP/2 is Live in Firefox, Patrick McManus)
Existing parallel()
API
Since HTTP/2 is multiplexed, it means that clients such as httpx should only open one connection per origin, even (especially?) for concurrent requests. This is indeed possible with the proposed parallel()
API:
client = httpx.AsyncClient()
async with client.parallel() as parallel:
pending_one = await parallel.get('https://example.com/1')
pending_two = await parallel.get('https://example.com/2')
response_one = await pending_one.get_response()
response_two = await pending_two.get_response()
In that case, the TCP connection to example.com can live in an async task owned by the parallel
object, receive orders with await parallel.get()
and return responses with the await pending.get_response()
calls. That works well, but I believe four things could be improved here:
- The APIs for performing serial requests and for performing parallel requests are different, so one must choose carefully, and evolving one’s code from one API to the other requires work.
- And since most people don’t read the docs, they’re likely to launch requests in parallel using the client directly, and likely won’t notice that their code does not take advantage of HTTP/2 multiplexing.
- Launching tasks is not the job of the HTTP client: that should be left to each async framework, since they all have a preferred style.
- The parallel API increases the API surface of httpx.
Proposed unified async API
Based on those observations, I believe a better API would be to only allow instantiating the client using a context manager, eg. async with httpx.AsyncClient() as client
. This then allows different styles. I’m not very familiar with asyncio, but I believe the above example would become:
async with httpx.AsyncClient() as client:
response_one = asyncio.create_task(client.get('https://example.com/1'))
response_two = asyncio.create_task(client.get('https://example.com/2'))
await response_one
await response_two
But you can also use other primitives, such as asyncio.gather
:
async with httpx.AsyncClient() as client:
for response in await asyncio.gather(
client.get('https://example.com/1'),
client.get('https://example.com/2')]:
...
And this fits more easily with other async frameworks, such as trio:
async with httpx.AsyncClient("trio") as client:
async with trio.open_nursery() as nursery:
nursery.start_soon(client.get, 'https://example.com/1')
nursery.start_soon(client.get, 'https://example.com/2')
And it would also be the preferred way to launch single requests:
async with httpx.AsyncClient() as client:
response = await client.get('https://example.com/1')
What about the sync client?
The same logic applies for the sync client, and I would personally reuse what Python already offers to perform tasks in parallel: concurrent.futures. I guess an executor specific to httpx using asyncio behind the scenes could work.
Of course, it makes sense to keep http.get(url)
for backwards compatibility with requests and to deal with the common case where users only need to make a single sync request.
Conclusion
I believe this new proposed API has the following advantages:
- it’s more unified (only one way to do it)
- it’s natural to go from serial requests to concurrent requests
- it delegates parallel task creation to the async framework
- it reduces the API surface
The drawbacks are that the sync/async cases no longer use the same parallel()
API and that the simple common async case is slightly more cumbersome to type, but I personally believe that this proposal is a better compromise.
What do you think?
(Disclaimer: this idea is originally from @njsmith, I took the time to turn it into a proposal and probably add errors of my own.)
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:14 (9 by maintainers)
Top GitHub Comments
I dunno, I’m mostly relying on @lukasa here, but he spent some effort pounding it into my head that HTTP/2 definitely needed a task reading from the underlying connection at all times.
I don’t think letting whichever task happens to be interacting with the connection do the driving is going to work in any case, because of cancellation. If your data send operation gets cancelled in the middle, it’s extremely difficult to recover in any reasonable way. If it’s in a background task where cancellation means that the whole
async with open_session
block is getting closed down, then that’s OK, but if cancelling aget
call can cause the underlying connection to get corrupted then that’s no good.We’ve spent a lot of time trying to figure out these patterns over the last few years, with simpler protocols like TLS and websocket. I originally thought like you are now, but I discovered I was wrong 😃. Trio’s TLS code does use the pattern you describe. That’s a much simpler protocol flow-control wise – basically just two unidirectional streams that barely interact – and I think it’s about at the limit of what you can handle that way; it took a ton of effort to get working and it still has some edge cases around cancellation that I’m not quite sure we’re handling right.
We do.
If you’re using multiple tasks, or multi threading, with a single client, then any HTTP/2 connections will already be multiplexed. In eg. a web app environment you don’t really see that because the server is handling the concurrency aspect for you, so each individual code path reads as a single sequential request, but you’ll actually end up having multiplexed requests across the same client. And yes, the AsyncClient can be used with any standard concurrency primitives (for whichever backend).
(Proviso: I think we may have some niggly races that aren’t yet resolved in the threaded case, but that’s an “we’re in alpha” buglette, rather than an interface issue.)
Also “instantiating the client using a context manager, eg. async with httpx.AsyncClient() as client” isn’t the right level here. Eg. supposing we’re in a web app, then…
Is the wrong thing to do, because you want to make sure you’re using shared connection pooling across all the incoming requests, rather than just within the context of a single endpoint during a single request/response cycle. So what you actually want is…
What this issue actually reduces to is “let’s not introduce the parallel requests API”.
That’s feasible although there’s two primary reasons why we might want the parallel requests API…
I think there’s also a broader issue here, around context manged vs. non context managed APIs. In particular:
Summary: I think that we should close the issue off, but not start or consider #52 until we’ve fully addressed the prerequisities of making sure that we’ve got support for both an asyncio and a trio backend, and that we’re all happy whatever ConcurrencyBackend interfaces are neccessary in order to adequately support that.