Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"too many SQL variables" Error with pandas 0.23 - enable multivalues insert #19664 issue

See original GitHub issue


#let's import packages to use
import numpy as np
import pandas as pd
from sqlalchemy import create_engine


# #use pandas to import data
df = pd.DataFrame(np.arange(0,20000,1))
#create the engine to connect pandas with sqlite3
engine = create_engine('sqlite://')
#create connection
conn = engine.connect()
#convert df to sql table
df.to_sql('test',engine, if_exists='replace',chunksize=1000)
#print results
result = conn.execute("select * from test")
for row in result:
    print(row['index'])
conn.close()

Problem description

In pandas 0.22 I could write a dataframe to sql of reasonable size without error. Now I receive this error “OperationalError: (sqlite3.OperationalError) too many SQL variables”. I am converting a dataframe with ~20k+ rows to sql. After looking around I suspect the problem lies in the limit set by sqlite3: SQLITE_MAX_VARIABLE_NUMBER which is set to 999 by default (based on their docs). This can apparently be changed by recompiling sqlite and adjusting this variable accordingly. I can confirm that for a df of length (rows) 499 this works. I can also confirm that this test version works with a row length of 20k and a chunksize of 499 inputted with df_to_sql works. In my real case the limit is 76. These numbers are clearly dependent on data size of each row so a method is required to estimate this based on data type and number of columns. #19664

Expected #Output

runfile(‘H:/Tests/Pandas_0.23_test.py’, wdir=‘H:/Tests’) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

####Trace###########

runfile(‘H:/Tests/Pandas_0.23_test.py’, wdir=‘H:/Tests’) Traceback (most recent call last):

File “<ipython-input-2-7d10a48edaae>”, line 1, in <module> runfile(‘H:/Tests/Pandas_0.23_test.py’, wdir=‘H:/Tests’)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 705, in runfile execfile(filename, namespace)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 102, in execfile exec(compile(f.read(), filename, ‘exec’), namespace)

File “H:/Tests/Pandas_0.23_test.py”, line 19, in <module> df.to_sql(‘test’,engine, if_exists=‘fail’,chunksize=500)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py”, line 2127, in to_sql dtype=dtype)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py”, line 450, in to_sql chunksize=chunksize, dtype=dtype)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py”, line 1149, in to_sql table.insert(chunksize)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py”, line 663, in insert self._execute_insert(conn, keys, chunk_iter)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py”, line 638, in _execute_insert conn.execute(*self.insert_statement(data, conn))

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 948, in execute return meth(self, multiparams, params)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\sql\elements.py”, line 269, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 1060, in _execute_clauseelement compiled_sql, distilled_params

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 1200, in _execute_context context)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 1413, in _handle_dbapi_exception exc_info

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\util\compat.py”, line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\util\compat.py”, line 186, in reraise raise value.with_traceback(tb)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 1193, in _execute_context context)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\default.py”, line 507, in do_execute cursor.execute(statement, parameters)

OperationalError: (sqlite3.OperationalError) too many SQL variables [SQL: 'INSERT INTO test (“index”, “0”) VALUES (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?,

Output of `pd.show_versions()`

pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.23.0 pytest: 3.5.1 pip: 10.0.1 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:37 (21 by maintainers)

Top GitHub Comments

3reactions

tripkanecommented, May 17, 2018

@TomAugspurger: After further testing the initial assumption seems correct and is based on the SQLITE_MAX_VARIABLE_NUMBER limit of 999. The max allowable chunksize associated with pd.df_to_sql is given by chunksize=999//(cols+1) where cols was generated for the test by cols = np.random.randint(1,20) and used to create df = pd.DataFrame(np.random.random((20000, cols)))

2reactions

gnfraziercommented, Sep 13, 2018

Had the initial issue using 23.0 updating to 23.4 corrected the problem. Thanks for the fix.