Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PostgresqlDatabase multithreading performance problem

See original GitHub issue

Hello, I have a postgresql db connection:

database = PostgresqlDatabase(config_server['database_name'],
                              **{'host': config_server['host'], 'port': config_server['port'],
                                 'user': config_server['user'],
                                 'password': config_server['password']
                                 })

and each thread(4 thread) runs the below function with different date_time parameter, we must run the below function 100 times:

def get_estimations_v1(date_time, db) -> pd.DataFrame:
    Estimation._meta.set_table_name('est_' + date_time.strftime("%Y%m%d"))

    try:
        db.connect(reuse_if_open=True)
        query = Estimation.select().where(
            Estimation.date_time.between(date_time - timedelta(minutes=1), date_time))

         estimations = pd.DataFrame(list(query.dicts()))
         return estimations

    except Exception as ce:
        logger.error('load of estimations failed for {}, {}', date_time, str(ce))
        return
    finally:
        db.close()
        return

but the performance is worse than sequential running. anyone have any ideas?

Issue Analytics

State:
Created 10 months ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

coleifercommented, Dec 2, 2022

I made a little script that inserts 100K rows and queries it single-threaded, using 4 threads and 4 processes. The single-thread and 4-thread run about the same, but the multiprocess version runs in about half the time.

The psycopg2-only version is faster, but this is somewhat expected since it carries none of the python-side overhead. It is also marginally faster (10%) when run multi-threaded.

I profiled the code and it looks to me like the bulk of the time in peewee is spent generating sql and serializing result rows to model instances, which is to be expected.

0reactions

mohamin1995commented, Nov 29, 2022

No, single-threaded version of psycopg2 almost takes time equal to single-threaded peewee.

Top Results From Across the Web

Parallelism in PostgreSQL - Percona Database Performance ...

Parallelism, in a sense, is where a single process can have multiple threads to query the system and utilize the multicore in a...

Python multi-threading with Postgresql - Stack Overflow

The performance is your limit, throw too much threads at the problem and the performance will suffer. As for connection pooling, you only...

Are mutliple client-based queries multithreaded internally in ...

This seems to suggest, that if i connect to the same instance of Postgres running on my machine, multiple connections from clients are ......

How to Fix PostgreSQL Performance Issues with PG Extras

In this blog post, I present a step by step guide on using PG Extras library to spot and resolve common PostgreSQL database...

Improving PostgreSQL performance without making changes ...

Every application and database workload is different; some workloads get a little more performance out of Huge Pages, and some get a lot...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

PostgresqlDatabase multithreading performance problem

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

'IOError: [Errno 11] Resource temporarily unavailable' with Peewee sample blog app

What's the proper way of serializing a dynamically extended model for use in another process?