question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Connection Pinning with AWS RDS Proxy

See original GitHub issue

I am using AWS RDS Proxy with Postgres. While investigating the logs for the proxy I saw many warnings with the same message:

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-proxy.html#rds-proxy-pinning

The client session was pinned to the database connection [dbConnection=3377128289] for the remainder of the session. The proxy can't reuse this connection until the session ends. Reason: SQL changed session settings that the proxy doesn't track. Consider moving session configuration to the proxy's initialization query. Digest: "set search_path = $1; set names $2;".

It looks like Prisma sets these whenever it creates a new connection which causes the Proxy to pin all connections until released rendering the proxy useless. As suggested in the log message above, I would like to set the search_path and NAMES while initialising the Proxy and not each time a new connection is created.

Is there any configuration parameter or any other way through which I can tell Prisma not to set these per connection?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:16
  • Comments:17 (10 by maintainers)

github_iconTop GitHub Comments

19reactions
pimeyscommented, May 6, 2021

So, let me open up some of the problems with serverless functions and stateful database connections, so you understand where we stand right now.

What is a stateful connection

In many relational databases, opening up a connection is relatively expensive. In PostgreSQL, a connection cost in RAM is about 14 megabytes, meaning opening up thousand connections uses gigabytes and gigabytes of memory. Creating a new connection with all the handshakes and TLS takes time, so we want to reuse them.

When we have a serverless environment, your functions want their own database connections, but are short-lived and paused a lot, meaning we keep connections up doing nothing; connections that eat up RAM from your database server.

Before the serverless trend started, there were certain tools provided for sharing these connections so that you can reserve a static amount of them to a pool and from the other side open much cheaper connections to the pool. When people started wanting to use databases with the serverless platforms, these were the tools of choice.

Pgbouncer

The classic PostgreSQL pool has three modes of operation:

  • Session mode offers a way of divisioning your company’s connections per account. One pool connection is always pinned to one application connection.
  • Statement mode offers one connection per statement, no transactions or prepared statements are allowed.
  • Transaction mode is the same as statement mode, but gives one connection per transaction.

In Prisma, we can support the session and transaction modes. With serverless, the transaction mode might interest you, but also understand the tradeoffs of this approach.

Prepared statements are stored in a database connection, meaning if we share the connections between different functions, we’ll leak the statements and cause undefined behavior eventually that is very hard to debug.

The by-the-book solution here is to not prepare, but use the text protocol instead:

INSERT INTO a (id, title) VALUES ($1, $2)

Turns into:

INSERT INTO a (id, title) VALUES (123, 'asdf')

The classic prepared statement approach is a very effective way of compiling the statement to the connection speeding up all further queries, but even more importantly it allows us very effectively prevent SQL injection attacks, turning the innocent query to a table-dropping monster if a user sends a param '); DROP TABLE a; --":

INSERT INTO a (id, title) VALUES (123, ''); DROP TABLE a; -- ');'

This is only a simple example, but there are many more. Understanding them and following them up when hackers find new ways of breaking the database is something we decided is a bit too much for us to do, especially if and when we risk user data with our decision.

Now, when using statements in Pgbouncer’s transaction mode, the pool will give eventually interesting errors. Let’s imagine we have a pool of one connection and two clients are connected to it:

The first one queries:

SELECT id FROM a WHERE title = $1

And the second user does:

UPDATE a SET title = $1 WHERE id = $2

How a PostgreSQL client does this is it first prepares:

PREPARE s1 (text) AS SELECT id FROM a WHERE title = $1

The identifier s1 is chosen by the client and is incremented by one for subsequent queries. The user here then runs:

EXECUTE s1('meow');

And gets their result correctly.

Now, the second user comes from a different client, has a different connection to the bouncer and doesn’t know about the other client. It runs:

PREPARE s1 (text, int) AS UPDATE a SET title = $1 WHERE id = $2

This will crash, because the connection in Pgbouncer saves the previous statement s1 for other uses, but our second client doesn’t know about this and wants to save a new statement with the same name. And we’re in trouble.

What we could do is force a randomized statement name per connection. This would not crash, but would also pollute the connection with lots of statements that are hard to clean, leading to database server memory leaks.

We do a quite special trick to get around of this. What we must do is:

  • Clean the statements before doing anything else.
  • Run everything in a transaction.
BEGIN; -- first roundtrip
DEALLOCATE ALL; -- #2
PREPARE s1 (text) AS SELECT id FROM a WHERE title = $1 -- # 3
EXECUTE s1('meow'); -- #4
COMMIT; -- # 5

Five roundtrips. But at least we don’t crash anymore! This is what you get when you use the pgbouncer=true in the connection string.

RDS Proxy

Amazon RDS is a bit different here compared to Pgbouncer. What they do when you prepare a statement is that they pin the connection from the pool to the connection from the user. This makes it easier to do stuff like statements without crashing your software, but much harder to hack around the limitations.

Here we immediately lose our advantage of a transaction mode when we:

  • PREPARE
  • Do any connection-level settings.
  • Run large queries (e.g. an IN statement that has many ids)

And we don’t have any ways of getting around of this without actually reducing our security by getting rid of prepared statements in our code.

Stateless connections

Now we get to the real solution, how serverless database connections should work. If we don’t store any state to the connection, they suddenly become very cheap and scale up to the amounts what a normal mid-sized serverless application might need. This of course removes certain things what you can’t do anymore, one of them is long-running transactions.

We are currently investigating our possibilities here. Please be patient and follow us later this year for more news.

3reactions
dpetrickcommented, May 5, 2021

After assessing what would be necessary I’m sorry to say that the engineering effort to comply with RDS proxy’s policies to not pin connections is far too much to do this in the near future. We may revise the topic in the future, but unfortunately we can’t offer a solution right now without spending weeks of dedicated engineering time.

The limitation is documented here: https://www.prisma.io/docs/guides/deployment/deployment-guides/caveats-when-deploying-to-aws-platforms#aws-rds-proxy

We’ll keep this issue around as a feature request to track interest in the topic.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using Amazon RDS Proxy - AWS Documentation
RDS Proxy establishes a database connection pool and reuses connections in this pool. This approach avoids the memory and CPU overhead of opening...
Read more >
RDS Proxy and Connection Pinning - Medium
RDS proxy executes transaction polling by default, this is the most efficient way to utilize the database connections. A connection is borrowed ...
Read more >
Pinning - Amazon RDS for PostgreSQL - Workshop
Pinning reduces the effectiveness of connection reuse, and if all or almost all of your connections experience pinning, consider modifying your application code ......
Read more >
AWS RDS Proxy and session pinning - Google Groups
This issue of session pinning with connection pooling servers (such as RDS Proxy) may be more general, and it may be good to...
Read more >
Getting started with RDS Proxy - Amazon Web Services 文档
Connection borrow timeout. In some cases, you might expect the proxy to sometimes use all available database connections. In such cases, you can...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found