to_sql function takes forever to insert in oracle database
See original GitHub issueI am using pandas to do some analysis on a excel file, and once that analysis is complete, I want to insert the resultant dataframe into a database. The size of this dataframe is around 300,000 rows and 27 columns.
I am using pd.to_sql
method to insert dataframe in the database. When I use a MySQL
database, insertion in the database takes place around 60-90 seconds. However when I try to insert the same dataframe using the same function in an oracle
database, the process takes around 2-3 hours to complete.
Relevant code can be found below:
data_frame.to_sql(name='RSA_DATA', con=get_engine(), if_exists='append',
index=False, chunksize=config.CHUNK_SIZE)
I tried using different chunk_size
s (from 50 to 3000), but the difference in time was only of the order of 10 minutes.
Any solution to the above problem ?
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (2 by maintainers)
Top Results From Across the Web
Speed up to_sql() when writing Pandas DataFrame to Oracle ...
Pandas + SQLAlchemy per default save all object (string) columns as CLOB in Oracle DB, which makes insertion extremely slow.
Read more >Insert happening very slow... — oracle-tech
Hi, Insert happening very slow after sqlldr happening in my program. ... 1) SQLLDR will be called, it will insert around 4 lakhs...
Read more >Insert still running for a long time - Oracle Communities
Hi All - We have an insert that is running forever. In this case, it began yesterday at 5 pm and still shows...
Read more >Huge Insert! - Ask TOM
This insert has taken 3 days to insert just 4 million records and there are way more ... Seems to me this insert...
Read more >11 Tuning PL/SQL Applications for Performance
(With the many performance improvements in Oracle Database 10g, any code ... Badly written subprograms (for example, a slow sort or search function)...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
As mentioned by @wuhaochen I have also ran into this problem. For me the issue was that oracle was creating columns of CLOB data type for all the string columns of the pandas dataframe. I sped-up the code by explicitly setting the
schema
dtype
parameter ofto_sql()
and using VARCHAR dtypes for string columns.I think this should be the default behavior of
to_sql
as creating CLOB is counter-intuitive.to_sql() is still practically broken when working with Oracle without using the workaround recommended above.