Slow SELECT query on large table
See original GitHub issue- asyncpg version: 0.22.0
- PostgreSQL version: 11.10
- Do you use a PostgreSQL SaaS? If so, which? Can you reproduce the issue with a local PostgreSQL install?: No, and I was able to reproduce it locally.
- Python version: 3.7.9
- Platform: Fedora 31
- Do you use pgbouncer?: No
- Did you install asyncpg with pip?: Yes
- If you built asyncpg locally, which version of Cython did you use?:
- Can the issue be reproduced under both asyncio and uvloop?: We use Starlette for serving API requests.
We have a large (~30 million rows) database table. I have noticed the SQL would run terribly slow when executed via asyncpg but very fast via psql.
Schema for table:
CREATE TABLE addresses(
zipcode VARCHAR(12),
line1 text,
-- 13x more fields of text type
);
CREATE INDEX z_idx ON addresses(zipcode, line1);
Query to execute.
SELECT line_1, ..., zipcode FROM addresses WHERE REPLACE(zipcode, ' ', '')=$1 GROUP BY A, B, line1 ORDER BY A, B
This query never results in more than 100 rows (out of mentioned 30 million).
Time to execute via psql: less than 1 ms
Same query via asyncpg: more than 6 seconds (!)
I have not looked at source code for asyncpg so not sure what’s going on here.
Can someone tell me why asyncpg runs this query so slow? Thanks!
FYI I have fixed this “temporarily” by some quick data normalisation where I added zipcode_x that contains no space so I could got rid of the SQL’s REPLACE function.
SELECT line_1, ..., zipcode FROM addresses WHERE zipcode=$1 GROUP BY A, B, line1 ORDER BY A, B
and now asyncpg is very fast (as it should be).
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (2 by maintainers)
Top Results From Across the Web
Slow MySQL SELECT on large table - Stack Overflow
Slow MySQL SELECT on large table · I would recommend you to run the OPTIMIZE command and then run the SELECT query above...
Read more >Mysql slow query on huge table - DBA Stack Exchange
I have huge table (about 5 billion rows). This query takes about 3 minutes on first run. Create temporary table takes almost 3...
Read more >SQL Server SELECT statement is slow in Large Tables. - MSDN
I'm facing a problem in our current running application. There is a table in database which has 67 columns,among these 18 columns are...
Read more >Improve SQL Server Performance for Large Log Table ...
In this article we look at a technique you could implement to help performance when querying very large SQL Server log tables.
Read more >Why MySQL Could Be Slow With Large Tables? - Percona
Avoid joins to large tables Joining of large data sets using nested loops is very expensive. Try to avoid it. Joins to smaller...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I just had very similar symptoms, with a query that was fast in psql taking 6 seconds via asyncpg.
After a lot of testing I narrowed it down to having ENUM columns in the query results. My query returning 16 columns took 15ms… adding an enum column made that into 1.5 seconds, and adding a second enum column made that into 6 seconds!
The fix discussed at https://github.com/MagicStack/asyncpg/issues/530#issuecomment-577183867 seemed to work, and reduced the time back down to 20ms, so I guess this is related to the postgres JIT rather than something in asyncpg. Perhaps that might be the cause in your query too?
Thanks @Tomcat-Engineering - the JIT issue sounds like a good rabbit hole to follow 😃
Unfortunately, I no longer have means to test the above using asyncpq … since I switched to Sqlite as our huge database is generated once a month and it’s read-only 100% of time so the little dB engine is more suited for now.