Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Slow SELECT query on large table

See original GitHub issue

asyncpg version: 0.22.0
PostgreSQL version: 11.10
Do you use a PostgreSQL SaaS? If so, which? Can you reproduce the issue with a local PostgreSQL install?: No, and I was able to reproduce it locally.
Python version: 3.7.9
Platform: Fedora 31
Do you use pgbouncer?: No
Did you install asyncpg with pip?: Yes
If you built asyncpg locally, which version of Cython did you use?:
Can the issue be reproduced under both asyncio and uvloop?: We use Starlette for serving API requests.

We have a large (~30 million rows) database table. I have noticed the SQL would run terribly slow when executed via asyncpg but very fast via psql.

Schema for table:

CREATE TABLE addresses(
    zipcode VARCHAR(12),
    line1 text,
    -- 13x more fields of text type
);
CREATE INDEX z_idx ON addresses(zipcode, line1);

Query to execute.

SELECT line_1, ..., zipcode FROM addresses WHERE REPLACE(zipcode, ' ', '')=$1 GROUP BY A, B, line1 ORDER BY A, B

This query never results in more than 100 rows (out of mentioned 30 million).

Time to execute via psql: less than 1 ms Same query via asyncpg: more than 6 seconds (!)

I have not looked at source code for asyncpg so not sure what’s going on here.

Can someone tell me why asyncpg runs this query so slow? Thanks!

FYI I have fixed this “temporarily” by some quick data normalisation where I added zipcode_x that contains no space so I could got rid of the SQL’s REPLACE function.

SELECT line_1, ..., zipcode FROM addresses WHERE zipcode=$1 GROUP BY A, B, line1 ORDER BY A, B

and now asyncpg is very fast (as it should be).

Issue Analytics

State:
Created 2 years ago
Comments:8 (2 by maintainers)

Top GitHub Comments

1reaction

Tomcat-Engineeringcommented, Jun 17, 2021

I just had very similar symptoms, with a query that was fast in psql taking 6 seconds via asyncpg.

After a lot of testing I narrowed it down to having ENUM columns in the query results. My query returning 16 columns took 15ms… adding an enum column made that into 1.5 seconds, and adding a second enum column made that into 6 seconds!

The fix discussed at https://github.com/MagicStack/asyncpg/issues/530#issuecomment-577183867 seemed to work, and reduced the time back down to 20ms, so I guess this is related to the postgres JIT rather than something in asyncpg. Perhaps that might be the cause in your query too?

0reactions

ahasoftwarecommented, Jun 17, 2021

Thanks @Tomcat-Engineering - the JIT issue sounds like a good rabbit hole to follow 😃

Unfortunately, I no longer have means to test the above using asyncpq … since I switched to Sqlite as our huge database is generated once a month and it’s read-only 100% of time so the little dB engine is more suited for now.