question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

to_sql function takes forever to insert in oracle database

See original GitHub issue

I am using pandas to do some analysis on a excel file, and once that analysis is complete, I want to insert the resultant dataframe into a database. The size of this dataframe is around 300,000 rows and 27 columns. I am using pd.to_sql method to insert dataframe in the database. When I use a MySQL database, insertion in the database takes place around 60-90 seconds. However when I try to insert the same dataframe using the same function in an oracle database, the process takes around 2-3 hours to complete.

Relevant code can be found below:

data_frame.to_sql(name='RSA_DATA',  con=get_engine(), if_exists='append',
                          index=False, chunksize=config.CHUNK_SIZE)

I tried using different chunk_sizes (from 50 to 3000), but the difference in time was only of the order of 10 minutes. Any solution to the above problem ?

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
BenjaminHabertcommented, Oct 14, 2019

As mentioned by @wuhaochen I have also ran into this problem. For me the issue was that oracle was creating columns of CLOB data type for all the string columns of the pandas dataframe. I sped-up the code by explicitly setting the schema dtype parameter of to_sql() and using VARCHAR dtypes for string columns.

I think this should be the default behavior of to_sql as creating CLOB is counter-intuitive.

0reactions
iron0012commented, Sep 8, 2022

to_sql() is still practically broken when working with Oracle without using the workaround recommended above.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Speed up to_sql() when writing Pandas DataFrame to Oracle ...
Pandas + SQLAlchemy per default save all object (string) columns as CLOB in Oracle DB, which makes insertion extremely slow.
Read more >
Insert happening very slow... — oracle-tech
Hi, Insert happening very slow after sqlldr happening in my program. ... 1) SQLLDR will be called, it will insert around 4 lakhs...
Read more >
Insert still running for a long time - Oracle Communities
Hi All - We have an insert that is running forever. In this case, it began yesterday at 5 pm and still shows...
Read more >
Huge Insert! - Ask TOM
This insert has taken 3 days to insert just 4 million records and there are way more ... Seems to me this insert...
Read more >
11 Tuning PL/SQL Applications for Performance
(With the many performance improvements in Oracle Database 10g, any code ... Badly written subprograms (for example, a slow sort or search function)...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found