question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Community discussion

See original GitHub issue

Although I’ve not yet done much in the way of documentation or branding this pacakge up, it’s quickly narrowing in on a slice of functionality that’s lacking in the Python HTTP landscape.

Specifically:

  • A requests-compatible API.
  • With HTTP/2 and HTTP/1.1 support (and an eye on HTTP/3)
  • A standard sync client by default, but the option of an async client, if you need it.
  • An API for making requests in parallel. The standard case has a regular sync API on the front, but using async under the hood. Also an equivelent for the fully async case. (Both draws on Trio a little for lessons on how to manage branching, exceptions & cancellations.)
  • Ability to plug directly into WSGI or ASGI apps, instead of network dispatch. (For usage as a test client, or implementing stub services during test or CI)
  • Ability to plug into alternate concurrency backends.

I’ve also some thoughts on allowing the Request and Response models to provide werkzeug-like interfaces, allowing them to be used either client-side or server-side. One of the killer-apps of the new async+HTTP/2 functionality is allowing high-throughput proxy and gateway services to be easily built in Python. Having a “requests”-like package that can also use the models on the server side is something I may want to explore once all the other functionality is sufficiently nailed down.

Since “requests” is an essential & neccessary part of the Python ecosystem, and since this package is aiming to be the next steps on from that, I think it’s worth opening up a bit of community discussion here, even if it’s early days.

I’d originally started out expecting httpcore to be a silent-partner dependency of any requests3 package, but it progressed fairly quickly from there into “actually I’ve got a good handle on this, I think I need to implement this all the way through”. My biggest questions now are around what’s going to be the most valuable ways to deliver this work to the commnunity.

Ownership, Funding & Maintainence

Given how critical a requests-like HTTP client is to the Python ecosystem as a whole I’d be ammenable to community discussions around ownership & funding options.

I guess that I need to out by documenting & pitching this package in it’s own right, releasing it under the same banner and model at all the other Encode work, and then take things from there if and when it starts to gain any adoption.

I’m open to ideas from the urllib3 or requests teams, if there’s alternatives that need to be explored early on.

Requests

The functionality that this pacakge is homing in on meets the requirements for the proposed “Requests III”. Perhaps there’s something to be explored there, if the requests team is interested, and if we can find a good community-focused arrangement around funding & ownership.

urllib3

The urllib3 team obvs. have a vast stack of real-world usage expertise that’d be important for us to make use of. There’s bits of work that urllib3 does, that httpcore likely needs to do, including fuzziness around how content decoding actually ends up looking on the the real, messy web. Or, for example, pulling early responses before necessarily having fully sent the outgoing request.

Something else that could well be valuable would be implementing a urllib3 dispatch class alongside the existing h11/h2/async dispatch. Any urllib3 dispatch class would still be built on top of the underlying async structure, but would dispatch the urllib3 calls within a threadpool.

Doing so would allow a couple of useful things, such as being able to isolate behavioral differences between the two implementations, or perhaps allowing a more gradual switchover for critical services that need to take a cautious approach to upgrading to a new HTTP client implementation.

Trio, Curio

I think httpcore as currently delivered makes it fairly easy to deliver a trio-based concurrency backend. It’s unclear to me if supporting that in the package itself is a good balance, or if it would be more maintainable to ensure that the trio team have have the interfaces they need, but that any implementation there would live within their ecosystem.

(I’d probably tend towards the later case there.)

Twisted

I guess that an HTTP/2 client would probably be useful to the Twisted team. I don’t really know enough about Twisted’s style of concurrency API to take a call on if there’s work here that could end up being valuable to them.

HTTP/3

It’ll be worth us keeping an eye on https://github.com/aiortc/aioquic

Having a QUIC implementation isn’t the only thing that we’d need in order to add HTTP/3 support, but it is a really big first step.

We currently have connect/reader/writer interfaces. If we added QUIC support then we’d want our protocol interfaces to additionally support operations like “give me a new stream”, and “set the flow control”, “set the priority level”.

For standard TCP-based HTTP/2 connections, “give me a new stream” would always just return the existing reader/writer pair. For QUIC connections it’d return a new reader/writer pair for a protocol-level stream.

This is getting way ahead of ourselves, but I think we’ve probably got a good basis here to be able to later support HTTP/3.

One big blocker would probably be whatever HTTP-level changes are required between HTTP/2 and HTTP/3 The diffs between QPACK vs HPACK is one cases here, but there’s likely also differences given that the stream framing in HTTP/2 is at the HTTP-level, wheras the stream framing in HTTP/3 is at the transport-level.

It’s unclear to me if these differences are sufficiently incremental that they could fall into the scope of a future hyper/h2 package or not, or what the division of responsibilities would look like.

One important point to draw out here is that the growing complexities from HTTP/1.1, to HTTP/2, to HTTP/3, mean that the Python community is absolutely going to need to need to tackle work in this space as a team effort - the layers in the stack need expertise in various differing areas.

Certificates

Right now we’ve using certifi for certificate checking. Christian Heimes has been doing some work in this space around accessing interfaces to the Operating System’s certificate store. I might try to collar him at PyLondinium.

Any other feedback?

I’m aware that much of this might look like it’s a bit premature, but the work is pretty progressed, even if I’ve not yet statrted focusing on any branding and documentation around it.

Are there other invested areas of the Python community that I’m not yet considering here?

Where are the urllib3, trio, requests, aiohttp teams heading in their own work in this space? Is there good scope for collaboration, and how do you think that could/should work?

What else am I missing?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:13
  • Comments:12 (9 by maintainers)

github_iconTop GitHub Comments

13reactions
theacodescommented, May 27, 2019

I believe beyond technical details we should establish a new PSF Work Group for HTTP to better acquire resources and funding to pay all of you to solve this problem.

There is no reason why we should be unorganized or alone in this. A PSF Work Group would allow us to better leverage fiscal sponsorship, governance, and cross-maintenance of projects.

From a technical perspective, my ideal world would be:

  1. Leave urllib3 and requests more or less alone. Every change is a breaking change. They have far too many users. Use tidelift to pay maintainers indefinitely.
  2. Establish shared, sans-io libraries that we can all build upon for http, http2, and http3/quic.
  3. Work together towards a brand new python-https project using the best from all of our experiences and the resources granted by the work group.
  4. Urllib3’s value is in its exhaustive test suite - mine what we can and move into shared libraries and into the python-https project.
  5. Request’s value is in its UX - borrow what makes sense.
9reactions
njsmithcommented, May 26, 2019

CC’ing some folks where I’m not sure if they’ve seen this or not: @pquentin @RatanShreshtha @nateprewitt @shazow @asvetlov @dstufft

@tomchristie: this is super cool, and thanks for starting the conversation.

I’ll start by summarizing what’s happening with the async-urllib3 work and what we’ve been thinking about there, so we can start figuring out how these different initiatives relate.

The async-urllib3 fork

For the last few years, me & @pquentin & @RatanShreshtha have been slowly working on adding async support to urllib3 (also incorporating some older work by @lukasa). The repo and issue tracker is here, and the basic approach is described here: https://github.com/urllib3/urllib3/issues/1323

What we’ve done so far

  • The core HTTP/1.1 support has been totally replaced – http.client has been ripped out and replaced by h11 + our own networking code.
  • We have both sync + async versions of the public APIs
  • The sync API works on python 2.7 and python 3.4+
  • The async API works on python 3.5+, on Trio and Twisted. Adding other backends is quite easy – at the low end, the Trio backend is 55 lines of code; at the high end, the Twisted backend is 184 lines of code, because we have to implement our own flow-controlled stream layer and error unwinding.
    • We’ve been leaning towards keeping these internal, because they’re small enough that I think the benefits from being able to refactor the interface with the rest of the library will outweigh the costs of maintaining them.
  • We’re still close enough to urllib3 upstream that we can use git merge to pull in their ongoing work (which there’s quite a bit of, to fix all kinds of exotic edge cases: https://github.com/urllib3/urllib3/pulls?q=is%3Apr+is%3Aclosed)
  • We support basically everything upstream urllib3 does except for urllib3.contrib, which means: SOCKS, alternative TLS backends (pyopenssl, securetransport), appengine support. It might make sense to port some of these too; we’re not sure.
  • Apart from the missing urllib3.contrib features, our sync API is passing the entire urllib3 test suite

What’s left to do

In general, my feeling is that the core HTTP functionality here is really solid. I think I heard @lukasa say once that it’s easy to write 90% of an HTTP client; the last 10% is where all the work is. (I guess this true of everything, but even more so for HTTP.) The async-urllib3 branch doubtless has exciting new bugs we haven’t found yet, but overall this is not a quick proof of concept, it’s a serious attempt at a production library that handles almost all the edge cases I know about, including things that urllib3 has only figured out within the last few months. It even handles early server responses (which is a known problem with classic urllib3, and required multiple iterations to figure out how to make it supportable across multiple networking backends). Though, we do still need to figure out what to do about header casing – https://github.com/python-hyper/h11/issues/31.

There are a bunch of minor things we need to do (e.g. docs, asyncio backend), and also two major ones:

  • We currently aren’t doing more than minimal smoke testing for the async versions of our interfaces. We need to async’ify the tests, to run against all the async backends too. This should be mostly mechanical, EXCEPT…
  • …One thing I’ve learned over the last few years is that urllib3’s public interfaces are too big – there’s a bunch of random bits and pieces that shouldn’t be exposed. Some of this is pretty trivial to fix (just add underscores). But the trickier part is the session APIs, PoolManager/HTTPConnectionPool/HTTPSConnectionPool/etc. These expose a bunch of details that should be handled internally, and the exposed details don’t necessarily make much sense (why does each host get its own session object?). This also makes writing generic tests unnecessarily tiresome, because we need to test all these classes separately, pick the sync/async version as appropriate, etc. Ideally there would just be a single session type that all calls start with, and we could pass in the appropriate version as a test fixture. I think this part of urllib3 is problematic enough that it’s worth reworking in any kind of “v2” project.

urllib3 vs async-urllib3 vs httpcore vs requests vs request3 vs idek

OK so that’s what we’ve been working on what the issues we’ve found. What about the larger strategy? First, just to lay out my general assumptions:

If it’s at all possible, our goal should be to converge on a single implementation of the core code for making HTTP requests, that almost everyone uses (either directly or via wrappers like requests is currently). HTTP clients have endless edge cases, so the more eyeballs we have on a single library, the more we can all benefit from each other’s experiences. Right now in our urllib3 branch, the Trio-specific code is ~2% of the total library (not counting tests, contrib, . It’s ridiculous that we can’t share the other ~98%.

Right now urllib3 is kinda that, except that it doesn’t handle async, hence the proliferation of async libraries.

Unfortunately urllib3 can’t add async without at least some backcompat breakage, because of all the exposed internals. (The public API exposes that it’s using http.client under the hood, it has dict-like interfaces that need to become async, etc.) And urllib3 is stupendously widely used, so our new library to rule them all is going to need a different name, and be parallel-installable to let people migrate gradually.

I still have hope that we can switch requests over to a new async-capable backend without breaking the world. The requests API is much smaller, and if we could pull it off this would (a) save a lot of migration work for people around the world, and (b) make the overall migration go much faster – which in turn means the folk here will get to (eventually) waste less energy on maintaining the old LTS releases of everything. In my perfect world, there’s no requests3 package because we don’t need it.

I don’t have a strong opinion on Python 2 support right now. It’s obviously getting less important every day. But the last stragglers are going to be projects like pip and botocore, which need a HTTP client, and would really like to have access to async support. Maybe they’ll be happy with using different clients on py2 and py3 (and in pip’s case, vendoring multiple clients)? I’m assuming requests itself will need to support py2 for another year+, and if py2 support is the difference between being able to switch requests vs having to convince everyone to migrate off requests, then that might be enough to make py2 support worth it. I don’t really want to keep caring about py2, but my overriding goal is to minimize the number of HTTP libraries we all have to support, and if py2 makes a difference there I’m willing to hold my nose and do it. …Depending on how hard it is to support py2, which we don’t know yet either.

I’m not super interested in ASGI/WSGI integration – it’s a neat feature that people will like, but not my main focus (and Trio will have the ability to mock out the network itself for testing, so you don’t necessarily need this kind of support inside individual libraries). I do wonder how you’ll provide an async API to WSGI apps or a sync API to ASGI apps, though?

I think talking about HTTP/2 is kinda premature, honestly. I looked at httpcore/dispatch/http2.py, and AFAICT it doesn’t support outgoing flow control or PING handling (both of which are protocol violations), and it doesn’t support multiplexing (which pretty much makes HTTP/2 support useless). And fixing these will require some substantial architectural changes, because they require background tasks and shared state across multiple connections. Which in turn will make it significantly more complicated to support multiple concurrency backends, and means you need to somehow disable HTTP/2 entirely when running in sync mode… it’s a lot of extra complexity. I think we should be strategizing on the shortest path to something shippable, and HTTP/2 is not on the critical path for that. We definitely want to get there eventually, and we need to keep an eye on it to make sure we don’t do anything that rules it out, but we don’t want to get people excited about something that we can’t deliver yet…

(BTW, we might also want to think about websocket client support eventually too – with HTTP/2 you can have HTTP and WS traffic over a single connection.)

Anyway. Looking at httpcore, my overall impression is … surprisingly complementary to the async-urllib3 work? The async-urllib3 stuff is really strong on low-level protocol stuff, but the public API has a decade of accumulated cruft. httpcore feels like it’s a few years away from handling all the gnarly edge cases, but the overall API and structure seem way more thought-through. I wonder if there’s any way to combine forces on that basis?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Community Discussion Forum Software | Forumbee
Engage your users with an online community. Empower peer-to-peer discussions and connect your product experts and brand advocates.
Read more >
Community Discussion: What it is & Best Software - QuestionPro
A community discussion is an online conversation between users and the specific spaces they share to exchange ideas and words to participate, interact, ......
Read more >
Newport Community Discussion | Facebook
Welcome to the Newport Community Resource Page! We discuss all things Newport and share information on how to fix issues in our beloved...
Read more >
8 Discussion Topics To Help Kick-Start Your Online Community
In this blog, we share 8 topic ideas to help you kick-start discussions in your own online community. Each of these discussion themes...
Read more >
Discussions - Steam Community
The Steam discussions are for everyone, new and advanced user alike! ... If someone has engaged in behavior that is detrimental to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found