question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Slow SELECT query on large table

See original GitHub issue
  • asyncpg version: 0.22.0
  • PostgreSQL version: 11.10
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce the issue with a local PostgreSQL install?: No, and I was able to reproduce it locally.
  • Python version: 3.7.9
  • Platform: Fedora 31
  • Do you use pgbouncer?: No
  • Did you install asyncpg with pip?: Yes
  • If you built asyncpg locally, which version of Cython did you use?:
  • Can the issue be reproduced under both asyncio and uvloop?: We use Starlette for serving API requests.

We have a large (~30 million rows) database table. I have noticed the SQL would run terribly slow when executed via asyncpg but very fast via psql.

Schema for table:

CREATE TABLE addresses(
    zipcode VARCHAR(12),
    line1 text,
    -- 13x more fields of text type
);
CREATE INDEX z_idx ON addresses(zipcode, line1);

Query to execute.

SELECT line_1, ..., zipcode FROM addresses WHERE REPLACE(zipcode, ' ', '')=$1 GROUP BY A, B, line1 ORDER BY A, B

This query never results in more than 100 rows (out of mentioned 30 million).

Time to execute via psql: less than 1 ms Same query via asyncpg: more than 6 seconds (!)

I have not looked at source code for asyncpg so not sure what’s going on here.

Can someone tell me why asyncpg runs this query so slow? Thanks!

FYI I have fixed this “temporarily” by some quick data normalisation where I added zipcode_x that contains no space so I could got rid of the SQL’s REPLACE function.

SELECT line_1, ..., zipcode FROM addresses WHERE zipcode=$1 GROUP BY A, B, line1 ORDER BY A, B

and now asyncpg is very fast (as it should be).

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Tomcat-Engineeringcommented, Jun 17, 2021

I just had very similar symptoms, with a query that was fast in psql taking 6 seconds via asyncpg.

After a lot of testing I narrowed it down to having ENUM columns in the query results. My query returning 16 columns took 15ms… adding an enum column made that into 1.5 seconds, and adding a second enum column made that into 6 seconds!

The fix discussed at https://github.com/MagicStack/asyncpg/issues/530#issuecomment-577183867 seemed to work, and reduced the time back down to 20ms, so I guess this is related to the postgres JIT rather than something in asyncpg. Perhaps that might be the cause in your query too?

0reactions
ahasoftwarecommented, Jun 17, 2021

Thanks @Tomcat-Engineering - the JIT issue sounds like a good rabbit hole to follow 😃

Unfortunately, I no longer have means to test the above using asyncpq … since I switched to Sqlite as our huge database is generated once a month and it’s read-only 100% of time so the little dB engine is more suited for now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Slow MySQL SELECT on large table - Stack Overflow
Slow MySQL SELECT on large table · I would recommend you to run the OPTIMIZE command and then run the SELECT query above...
Read more >
Mysql slow query on huge table - DBA Stack Exchange
I have huge table (about 5 billion rows). This query takes about 3 minutes on first run. Create temporary table takes almost 3...
Read more >
SQL Server SELECT statement is slow in Large Tables. - MSDN
I'm facing a problem in our current running application. There is a table in database which has 67 columns,among these 18 columns are...
Read more >
Improve SQL Server Performance for Large Log Table ...
In this article we look at a technique you could implement to help performance when querying very large SQL Server log tables.
Read more >
Why MySQL Could Be Slow With Large Tables? - Percona
Avoid joins to large tables Joining of large data sets using nested loops is very expensive. Try to avoid it. Joins to smaller...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found