Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reading table with chunksize still pumps the memory

See original GitHub issue

I’m trying to migrate database tables from MySQL to SQL Server:

import pandas as pd
from sqlalchemy import create_engine

my_engine = create_engine("mysql+pymysql://root:pass@localhost/gen")
ms_engine = create_engine('mssql+pyodbc://localhost/gen?driver=SQL Server')

for table_name in ['topics', 'fiction', 'compact']:
    for table in pd.read_sql_query('SELECT * FROM %s' % table_name,
                                   my_engine, 
                                   chunksize=100000):

        table.to_sql(name=table_name, con=ms_engine, if_exists='append')

I thought that using chunksize would release the memory, but it’s just growing up. I tried also garbage collector, but it has no effect.

Maybe my expectations were wrong?

I’m using Python 3.5.1 with pandas 0.17.1 and all latest packages, although I tried also Python 2.7 with pandas 0.16 and same results

Issue Analytics

State:
Created 8 years ago
Comments:12 (10 by maintainers)

Top GitHub Comments

27reactions

alfonsomhccommented, Jun 28, 2017

I see that server side cursors are supported in sqlalchemy now (New in version 1.1.4): http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#server-side-cursors

I have verified that

engine = create_engine('mysql+pymysql://user:password@domain/database', server_side_cursors=True)
result = engine.execute(sql_query)
result.fetchone()

returns a row inmediately (i.e. the client doesn’t read the complete table in memory). This should be useful to allow read_sql to read in chunks and avoid memory problems. Passing the parameter chunk to fetchmany: result.fetchmany(chunk) should do the trick?

16reactions

klonuocommented, Feb 14, 2016

Several days later, for reference…

Alembic was too complicated for my concentration. I tried FME and Navicat apps, and while later didn’t manage to make migration through “Data transfer” for all tables, former migrated successfully, but although MySQL tables were encoded in UTF-8 it didn’t use nvarchar data type for SQL Server, so I got records with garbage characters. On top of it no index was preserved.

So I used Python (^_^):

#!/usr/bin/env python3

import pandas as pd
from sqlalchemy import create_engine

my_engine = create_engine("mysql+pymysql://root:pass@localhost/gen?charset=utf8")
ms_engine = create_engine('mssql+pyodbc://localhost/gen?driver=SQL Server')

chunksize = 10000
for table_name in ['topics', 'fiction', 'compact']:

    row_count = int(pd.read_sql('SELECT COUNT(*) FROM {table_name}'.format(
        table_name=table_name), my_engine).values)

    for i in range(int(row_count / chunksize) + 1):
        query = 'SELECT * FROM {table_name} LIMIT {offset}, {chunksize}'.format(
            table_name=table_name, offset=i * chunksize, chunksize=chunksize)

        pd.read_sql_query(query, con=my_engine).to_sql(
            name=table_name, con=ms_engine, if_exists='append', index=False)