Reading table with chunksize still pumps the memory
See original GitHub issueI’m trying to migrate database tables from MySQL to SQL Server:
import pandas as pd
from sqlalchemy import create_engine
my_engine = create_engine("mysql+pymysql://root:pass@localhost/gen")
ms_engine = create_engine('mssql+pyodbc://localhost/gen?driver=SQL Server')
for table_name in ['topics', 'fiction', 'compact']:
for table in pd.read_sql_query('SELECT * FROM %s' % table_name,
my_engine,
chunksize=100000):
table.to_sql(name=table_name, con=ms_engine, if_exists='append')
I thought that using chunksize would release the memory, but it’s just growing up. I tried also garbage collector, but it has no effect.
Maybe my expectations were wrong?
I’m using Python 3.5.1 with pandas 0.17.1 and all latest packages, although I tried also Python 2.7 with pandas 0.16 and same results
Issue Analytics
- State:
- Created 8 years ago
- Comments:12 (10 by maintainers)
Top Results From Across the Web
Reading a SQL table by chunks with Pandas
Indeed, Pandas is usually allocating a lot more memory than the table data ... The table that we are reading has 1000000 rows,...
Read more >Reducing Pandas memory usage #3: Reading in chunks
Reduce Pandas memory usage by loading and then processing a file in chunks rather than all at once, using Pandas' chunksize option.
Read more >11.1 Creating a New LOB Column - Oracle Help Center
You can provide the LOB storage characteristics when creating a LOB column using the CREATE TABLE statement or the ALTER TABLE ADD COLUMN...
Read more >Connecting Pandas to a Database with SQLAlchemy
Save Pandas DataFrames into SQL database tables, or create DataFrames from SQL using Pandas' built-in SQLAlchemy integration.
Read more >Architecture and Design — khmer 1.0 documentation - khmer software
Data pumps stage data from disk storage into an in-memory cache. ... The read parsers and the layers under them can be controlled...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I see that server side cursors are supported in sqlalchemy now (New in version 1.1.4): http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#server-side-cursors
I have verified that
returns a row inmediately (i.e. the client doesn’t read the complete table in memory). This should be useful to allow read_sql to read in chunks and avoid memory problems. Passing the parameter chunk to fetchmany:
result.fetchmany(chunk)
should do the trick?Several days later, for reference…
Alembic was too complicated for my concentration. I tried FME and Navicat apps, and while later didn’t manage to make migration through “Data transfer” for all tables, former migrated successfully, but although MySQL tables were encoded in UTF-8 it didn’t use
nvarchar
data type for SQL Server, so I got records with garbage characters. On top of it no index was preserved.So I used Python (^_^):