question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Massive overhead when iterating over 1k+ rows in postgres even with server side cursors

See original GitHub issue

I’m seeing an inexplicably large overhead when iterating over a postgres table.

I profiled the code, and also did a smoke test with SQLAlchemy to make sure it wasn’t a slow connection or the underlying driver (psycopg2).

Running this against a postgres table of ~1M records but fetching only a tiny fraction of that.

import time

import peewee
import sqlalchemy
from playhouse import postgres_ext
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.engine.url import URL as AlchemyURL
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker as alchemy_sessionmaker

user = 'XXX'
password = 'XXX'
database = 'XXX'
host = 'XXX'
port = 5432

table = 'person'
limit = 1000

peewee_db = postgres_ext.PostgresqlExtDatabase(
    database=database,
    host=host, port=port,
    user=user, password=password,
    use_speedups=True,
    server_side_cursors=True,
    register_hstore=False,
)

alchemy_engine = sqlalchemy.create_engine(AlchemyURL('postgresql', username=user, password=password,
                                                     database=database, host=host, port=port))
alchemy_session = alchemy_sessionmaker(bind=alchemy_engine)()


class PeeweePerson(peewee.Model):
    class Meta:
        database = peewee_db
        db_table = table

    id = peewee.CharField(primary_key=True, max_length=64)
    data = postgres_ext.BinaryJSONField(index=True, index_type='GIN')


class SQLAlchemyPerson(declarative_base()):
    __tablename__ = table

    id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
    data = sqlalchemy.Column(JSONB)


def run_raw_query():
    ids = list(peewee_db.execute_sql(f"SELECT id from {table} order by id desc limit {limit}"))
    return ids


def run_peewee_query():
    query = PeeweePerson.select(PeeweePerson.id).order_by(PeeweePerson.id.desc()).limit(limit)
    ids = list(query.tuples())
    return ids


def run_sqlalchemy_query():
    query = alchemy_session.query(SQLAlchemyPerson.id).order_by(sqlalchemy.desc(SQLAlchemyPerson.id)).limit(limit)
    ids = list(query)
    return ids


if __name__ == '__main__':
    t0 = time.time()
    raw_result = run_raw_query()
    t1 = time.time()
    print(f'Raw: {t1 - t0}')

    t2 = time.time()
    sqlalchemy_result = run_sqlalchemy_query()
    t3 = time.time()
    print(f'SQLAlchemy: {t3 - t2}')

    t4 = time.time()
    peewee_result = run_peewee_query()
    t5 = time.time()
    print(f'peewee: {t5 - t4}')

    assert raw_result == sqlalchemy_result == peewee_result

Outputs

  • With limit = 1000:
Raw: 0.02643609046936035
SQLAlchemy: 0.03697466850280762
peewee: 1.0509874820709229
  • With limit = 10000
Raw: 0.15931344032287598
SQLAlchemy: 0.07229042053222656
peewee: 10.82826042175293

Both examples use server side cursors.

I briefly profiled this, and looks like 95%+ of the time is spent calling cursor.fetchone https://github.com/coleifer/peewee/blob/d8e34b0682d87bd56c1a3636445d9c0fccf2b1e2/peewee.py#L2340

I’ll continue profiling this, but was wondering if you knew what was up?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
coleifercommented, Nov 15, 2017

Leaving link to psycopg2 docs for handy reference: http://initd.org/psycopg/docs/usage.html#server-side-cursors

0reactions
ParthGandhicommented, Nov 27, 2017

Haven’t had a chance to try it out yet - got pulled into other work.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Overhead when iterating over 1k+ rows in postgres using ...
Both examples use server side cursors. I briefly profiled this, and looks like 95%+ of the time is spent calling cursor.fetchone https://github.
Read more >
Overhead when iterating over 1k+ rows in postgres using ...
This appears to be related to an inefficiency in the implementation of server-side cursors in Peewee 2.x. Specifically, I think it's because peewee's...
Read more >
Impact of Network and Cursor on Query Performance of ...
Knowing the impact of network-related overhead and cursors in PostgreSQL is important not only to alleviate confusion but also to get the ...
Read more >
The cursor class — Psycopg 2.7.3.2 documentation
Allows Python code to execute PostgreSQL command in a database session. ... in version 2.4: iterating over a named cursor fetches itersize records...
Read more >
Documentation: 15: 43.7. Cursors - PostgreSQL
Rather than executing a whole query at once, it is possible to set up a cursor that encapsulates the query, and then read...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found