Session, persistence operations and bulk insert
See original GitHub issueHi, thank you for sharing this ORM project, I have started playing with it by modifying and extending your introductory demo. Hope you like it. I have a few questions:
- Is there any persistence in memory of objects you create from
infi.clickhouse_orm.models
. I am referring to an ORM session kind of thing here ? - If not, do you think this is a useful feature to add in the future in terms of atomicity or caching or … ?
- What’s the best method to use with
infi.clickhouse_orm
for massive insertion of data, let’s say a table with 1million rows and two columns integer and float ?
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Additional Persistence Techniques — SQLAlchemy 2.0 ...
Bulk Operations¶. Legacy Feature. SQLAlchemy 2.0 has integrated the Session “bulk insert” and “bulk update” capabilities into 2.0 style ...
Read more >Additional Persistence Techniques
Bulk Operations mode is a new series of operations made available on the Session object for the purpose of invoking INSERT and UPDATE...
Read more >Batch Insert/Update with Hibernate/JPA - Baeldung
Learn how to use batch inserts and updates using Hibernate/JPA. ... it'll send a separate SQL statement for each insert/update operation:
Read more >Bulk saving complex objects SQLAlchemy - Stack Overflow
Session.bulk_save_objects() is too low level API for your use case, which is persisting multiple model objects and their relationships.
Read more >Chapter 13. Batch processing
Batch inserts. When making new objects persistent, you must flush() and then clear() the session regularly, to control the size of the first-level...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
about p.1 Commits, sessions and transactions as well as rollbacks in other ORMs are reflections of relational databases behaviour. Clickhouse doesn’t support transactions, so you can’t implement it. About existence in memory I still don’t understand what you mean.
p.3. Clickhouse has two interfaces - TCP and HTTP. infi.clickhouse_orm is using HTTP interface. Other python projects can use TCP, for example, https://github.com/mymarilyn/clickhouse-driver clickhouse-client is native client that can read CSV directly and insert it into db via most efficient way. I think transferring million of rows via HTTP is not the best idea.
Thank you @emakarov for your inputs, I agree with everything you wrote. Some additional comments below.
Sessions I’m aware of the concept of sessions in SQLAlchemy, personally I don’t like it. I modeled clickhouse_orm after Django’s ORM, which does not use sessions. In any case, sessions are mostly relevant for databases that support transactions, unlike ClickHouse.
TCP vs. HTTP To minimize the “time to market”, I used the most convenient interface when developing this project - HTTP. Of course it comes with some overhead, but at least for my use cases this is not significant. Furthermore, if I understand correctly the TCP interface requires using the Native data format. From the ClickHouse docs:
So it didn’t sound like a great idea to base an external library on a native, binary, internal format which might change without notice.
Overhead Loading large amounts of data is best done with the CLI, not via the ORM. Using the ORM adds a lot of overhead, since each line needs to be parsed, converted to Python objects, validated, and then re-encoded as text before sending it to the database. Using any ORM is a tradeoff - you sacrifice performance for convenience (because it’s always faster to talk to the database directly, but usually less convenient that using Python objects). This tradeoff might make sense when you’re generating new data or processing results of queries, but not when you just want to copy a million rows from a file into a database table.