question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

advice: best bulk upsert method that still allows to track # of affected rows?

See original GitHub issue

I’ve been relying on the newest implementation of executemany() to perform bulk upserts, but it has the shortcoming that it will not allow to easily determine the number of affected rows by parsing the statusmsg.

The number of effectively upserted rows can easily be less than the number of rows I attempt to upsert, since I qualify my ON CONFLICT clause with a further WHERE clause specifying that the update should only happen if the new and excluded tuples are distinct.

INSERT INTO "table_name" AS __destination_row (
    id,
    other_column
) VALUES ($1, $2)
ON CONFLICT (id)
DO UPDATE SET
    id = excluded.id,
    other_column = excluded.other_column
WHERE
    (__destination_row.id IS DISTINCT FROM excluded.id)
 OR
    (__destination_row.other_column IS DISTINCT FROM excluded.other_column)
;

(regular Postgres would allow for a much terser syntax, but this is the only syntax that is accepted by CockroachDB)

Suppose that at times knowing the exact number of effectively upserted rows is more crucial than the bulk performance, and yet I would prefer not to go to the extreme of upserting one row at a time, what would be the best compromise?

Should I rely on a temporary table and then upserting into the physical tables from that temporary table?

INSERT INTO "table_name" AS __destination_row (
    id,
    other_column
) SELECT (
    id,
    other_column
) FROM "__temp_table_name"
ON CONFLICT (id)
DO UPDATE SET
    id = excluded.id,
    other_column = excluded.other_column
WHERE
    (__destination_row.id IS DISTINCT FROM excluded.id)
 OR
    (__destination_row.other_column IS DISTINCT FROM excluded.other_column)
;

Should I instead use a transaction with several individual upserts of values once again provided by the client?

Are there other approaches I should explore?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
matthewhegartycommented, Sep 30, 2021

Just wanted to link the other related thread. You mention that you could allow executemany to return the statement result, is this on the roadmap?

1reaction
pauldrapercommented, May 13, 2021

does issuing two statements back to back at the same psql prompt somewhat emulate what happens on the wire with the new executemany()?

I think psql parses the SQL commands (at least, tokenizes them) and sends them separately.

Regardless, the simple query protocol (which asyncpg is using to send multiple statements without parsing them) does expose multiple result sets. https://www.postgresql.org/docs/13/protocol-flow.html#id-1.10.5.7.4

Read more comments on GitHub >

github_iconTop Results From Across the Web

PostgreSQL — How to UPSERT safely, easily and fast
- The first part is easy, just a regular insert with two rows. - From line 4 onwards we determine what to do...
Read more >
What is the best approach for upserting large number of rows ...
1 Answer 1 · Have no index or foreign key on the table while you load data (check constraints are fine). · Load...
Read more >
SQL Bulk Insert Concurrency and Performance Considerations
Removing indexes prior to large inserts on a table, including when using SQL Bulk Insert, may be a best practice to increase performance....
Read more >
Upsert data - Supabase
Perform an UPSERT on the table or view. Depending on the column(s) passed to onConflict , .upsert() allows you to perform the equivalent...
Read more >
Upserting Records | Apex Developer Guide
upsert method to upsert a collection of leads that are passed in. This example allows for partial processing of records, that is, in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found