Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Massive overhead when iterating over 1k+ rows in postgres even with server side cursors

See original GitHub issue

I’m seeing an inexplicably large overhead when iterating over a postgres table.

I profiled the code, and also did a smoke test with SQLAlchemy to make sure it wasn’t a slow connection or the underlying driver (psycopg2).

Running this against a postgres table of ~1M records but fetching only a tiny fraction of that.

import time

import peewee
import sqlalchemy
from playhouse import postgres_ext
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.engine.url import URL as AlchemyURL
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker as alchemy_sessionmaker

user = 'XXX'
password = 'XXX'
database = 'XXX'
host = 'XXX'
port = 5432

table = 'person'
limit = 1000

peewee_db = postgres_ext.PostgresqlExtDatabase(
    database=database,
    host=host, port=port,
    user=user, password=password,
    use_speedups=True,
    server_side_cursors=True,
    register_hstore=False,
)

alchemy_engine = sqlalchemy.create_engine(AlchemyURL('postgresql', username=user, password=password,
                                                     database=database, host=host, port=port))
alchemy_session = alchemy_sessionmaker(bind=alchemy_engine)()


class PeeweePerson(peewee.Model):
    class Meta:
        database = peewee_db
        db_table = table

    id = peewee.CharField(primary_key=True, max_length=64)
    data = postgres_ext.BinaryJSONField(index=True, index_type='GIN')


class SQLAlchemyPerson(declarative_base()):
    __tablename__ = table

    id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
    data = sqlalchemy.Column(JSONB)


def run_raw_query():
    ids = list(peewee_db.execute_sql(f"SELECT id from {table} order by id desc limit {limit}"))
    return ids


def run_peewee_query():
    query = PeeweePerson.select(PeeweePerson.id).order_by(PeeweePerson.id.desc()).limit(limit)
    ids = list(query.tuples())
    return ids


def run_sqlalchemy_query():
    query = alchemy_session.query(SQLAlchemyPerson.id).order_by(sqlalchemy.desc(SQLAlchemyPerson.id)).limit(limit)
    ids = list(query)
    return ids


if __name__ == '__main__':
    t0 = time.time()
    raw_result = run_raw_query()
    t1 = time.time()
    print(f'Raw: {t1 - t0}')

    t2 = time.time()
    sqlalchemy_result = run_sqlalchemy_query()
    t3 = time.time()
    print(f'SQLAlchemy: {t3 - t2}')

    t4 = time.time()
    peewee_result = run_peewee_query()
    t5 = time.time()
    print(f'peewee: {t5 - t4}')

    assert raw_result == sqlalchemy_result == peewee_result

Outputs

With limit = 1000:

Raw: 0.02643609046936035
SQLAlchemy: 0.03697466850280762
peewee: 1.0509874820709229

With limit = 10000

Raw: 0.15931344032287598
SQLAlchemy: 0.07229042053222656
peewee: 10.82826042175293

Both examples use server side cursors.

I briefly profiled this, and looks like 95%+ of the time is spent calling cursor.fetchone https://github.com/coleifer/peewee/blob/d8e34b0682d87bd56c1a3636445d9c0fccf2b1e2/peewee.py#L2340

I’ll continue profiling this, but was wondering if you knew what was up?

Issue Analytics

State:
Created 6 years ago
Comments:13 (7 by maintainers)

Top GitHub Comments

2reactions

coleifercommented, Nov 15, 2017

Leaving link to psycopg2 docs for handy reference: http://initd.org/psycopg/docs/usage.html#server-side-cursors

0reactions

ParthGandhicommented, Nov 27, 2017

Haven’t had a chance to try it out yet - got pulled into other work.

Top Results From Across the Web

Overhead when iterating over 1k+ rows in postgres using ...

Both examples use server side cursors. I briefly profiled this, and looks like 95%+ of the time is spent calling cursor.fetchone https://github.

Overhead when iterating over 1k+ rows in postgres using ...

This appears to be related to an inefficiency in the implementation of server-side cursors in Peewee 2.x. Specifically, I think it's because peewee's...

Impact of Network and Cursor on Query Performance of ...

Knowing the impact of network-related overhead and cursors in PostgreSQL is important not only to alleviate confusion but also to get the ...

The cursor class — Psycopg 2.7.3.2 documentation

Allows Python code to execute PostgreSQL command in a database session. ... in version 2.4: iterating over a named cursor fetches itersize records...

Documentation: 15: 43.7. Cursors - PostgreSQL

Rather than executing a whole query at once, it is possible to set up a cursor that encapsulates the query, and then read...