Ambassador does not properly handling web browser connection coalescing for HTTP/2 connections
See original GitHub issueWhen you have multiple domains use the same certificate (e.g. the server has a certificate that can be used for domain domain.com
and subdomains a.domain.com
and b.domain.com
) and the server supports HTTP/2, the browser will reuse the same connection for requests to domain.com
, a.domain.com
, and b.domain.com
. See this blog post for more info.
If you have created an individual virtual_host
for each of these domains in Ambassador (via Host
resources or TLSContext
s) Ambassador will reuse the same virtual_host
, but with a different host
in the request and you will get a 404
.
In more detail, if you create a TLSContext
like the below:
---
apiVersion: getambassador.io/v2
kind: TLSContext
metadata:
name: context
spec:
alpn_protocols: h2,http/1.1
hosts:
- domain.com
- a.domain.com
- b.domain.com
secret: ambassador-cert
You will get an Ambassador configured where:
- Ambassador will create three
virtual_host
s, one for each of thehosts
. ambassador-cert
is pointing at a certificate that works fordomain.com
,a.domain.com
andb.domain.com
so Ambassador is able to use the same certificate for each of thesevirtual_host
salpn_protocols: h2,http/1.1
is set so the browser will useHTTP/2
for the connection to Ambassador
Now, when you send a request to https://a.domain.com/ambassador/v0/diag/ in a web browser, it opens a single HTTP/2 connection to Ambassador with :authority: a.domain.com
. Ambassador then looks for a route in virtual_host: a.domain.com
, find the route to /ambassador/v0
, and correctly sends the request to the diagnostics page.
Now if you change the url to https://b.domain.com/ambassador/v0/diag/, the browser will reuse this same HTTP/2 connection to Ambassador but with :authority: b.domain.com
. Ambassador then, reusing the same connection to virtual_host: a.domain.com
, looks for a route in virtual_host: a.domain.com
but since the :authority
headers do not match any routes, returns a 404.
To Reproduce
Reproduction is pretty simple.
-
Deploy Ambassador
-
Get a certificate for
*.domain.com
-
Create a
TLSContext
that uses that certificate and sets- alpn_protocols: h2,http/1.1
- hosts: [ a.domain.com, b.domain.com]
-
Send a request to https://a.domain.com/ambassador/v0/diag/ in a browser and get the diag page
-
Change the url to https://b.domain.com/ambassador/v0/diag/ and get a 404
Workaround
Since this issue revolves around how Ambassador is creating virtual_hosts
and using the same certificate, a couple of possible workarounds exist that could be used until this is resolved.
-
Create a different certificate and
TLSContext
for each domain--- apiVersion: getambassador.io/v2 kind: TLSContext metadata name: domain-context spec: alpn_protocols: h2,http/1.1 hosts: - domain.com secret: domain-cert --- apiVersion: getambassador.io/v2 kind: TLSContext metadata name: a-domain-context spec: alpn_protocols: h2,http/1.1 hosts: - a.domain.com secret: a-domain-cert --- apiVersion: getambassador.io/v2 kind: TLSContext metadata name: b-domain-context spec: alpn_protocols: h2,http/1.1 hosts: - b.domain.com secret: b-domain-cert
This will make is so the browser does not reuse the same connection for a.domain.com and b.domain.com since it cannot use the same certificate.
-
Use a wildcard in the
TLSContext
so thatdomain.com
,a.domain.com
, andb.domain.com
use the samevirtual_host
--- apiVersion: getambassador.io/v2 kind: TLSContext metadata name: wild-context spec: alpn_protocols: h2,http/1.1 hosts: - "*" secret: wild-cert
Now, when the browser reuses the connection, Ambassador will use the same
virtual_host
which will match for all:authority
s
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:6 (5 by maintainers)
Top GitHub Comments
@LukeShu - as we discussed offline, this fix didn’t cover all the cases so I’m going to reopen this and outline what we discussed so we have a record of it.
Case 1: Coalesce wildcard sub-domains (ie.
a.example.com
,b.example.com
) - ✅ Case 2: Coalesce wildcard sub-domains with parent domain (i.e.a.example.com
,b.example.com
andexample.com
) - ⛔The first case was resolved per the fix that you referenced which means we will coalesce all the wild-card subdomains into a single envoy Filter Chain that does SNI matching on
*.example.com
.In the second case, when a TLS certificate has SAN names registered for both wild-card domains and parent domain then the browser will try to re-use the connection.
We currently generate Envoy configuration so that we have two FIlter Chains that do the L4 SNI matching for
*.example.com
and one forexample.com
. Navigating to a wild-card domain first will open a connection and the TLS Handshake will use the *.example.com domain for SNI. The browser will re-use the open connection when navigating directly to the parent domain. Since SNI is negotiated at TLS Handshake time, Envoy will re-use the connection and looks in the Filter Chain for*.example.com
and then when it tries to do the L7 matching on:authority == example.com
, there is no route available causing the 404 NR.Chrome: net-internals shows the same connection being used for the wildcard and parent domain.
Workaround: A non-code solution is for the user to use two different TLS Certs. One that has SAN for the wild card domain and the another one for the parent domain. By doing this the browser will re-use the existing connection for all wild-card domains (i.e.
a.example.com
,b.example.com
) but will open a new connection for requests to the parent domain (example.com
) since they no longer share a cert and can re-use the connection.Chrome: net-internals using different connections when using the workaround.
Potential Fix: Emissary will need to take into account the TLS Certs and the SANs registered within the cert along with the host matching to ensure that when the browser re-uses the connection that both the wild-card domains and parent domain can be matched in a single Filter Chain.
FYI… @ddymko @haq204 @AliceProxy I think this is a good one to be aware of.
This should have been closed along with https://github.com/datawire/apro/issues/1167 (which is just a mirror of the issue) by the PR https://github.com/datawire/apro/pull/1716 (seen in Emissary as https://github.com/emissary-ingress/emissary/commit/df926097c872e09851525de7ffeaa6e8577670d0) (which did close that mirror issue), which was included in v1.7.0 on 2020-08-27.
Though https://github.com/datawire/apro/pull/1907 (seen in Emissary as https://github.com/emissary-ingress/emissary/commit/7835f087056f5ba589ec19c51ff274107988b907) (for inclusion in v1.7.4) (reverted a bunch of changes to
v2listener.py
, it specifically did not revert the changes from datawire/apro#1716 because (as the commit message says) “is a fix that EPO cares about.”