question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"too many SQL variables" Error with pandas 0.23 - enable multivalues insert #19664 issue

See original GitHub issue

#let's import packages to use
import numpy as np
import pandas as pd
from sqlalchemy import create_engine


# #use pandas to import data
df = pd.DataFrame(np.arange(0,20000,1))
#create the engine to connect pandas with sqlite3
engine = create_engine('sqlite://')
#create connection
conn = engine.connect()
#convert df to sql table
df.to_sql('test',engine, if_exists='replace',chunksize=1000)
#print results
result = conn.execute("select * from test")
for row in result:
    print(row['index'])
conn.close()


Problem description

In pandas 0.22 I could write a dataframe to sql of reasonable size without error. Now I receive this error “OperationalError: (sqlite3.OperationalError) too many SQL variables”. I am converting a dataframe with ~20k+ rows to sql. After looking around I suspect the problem lies in the limit set by sqlite3: SQLITE_MAX_VARIABLE_NUMBER which is set to 999 by default (based on their docs). This can apparently be changed by recompiling sqlite and adjusting this variable accordingly. I can confirm that for a df of length (rows) 499 this works. I can also confirm that this test version works with a row length of 20k and a chunksize of 499 inputted with df_to_sql works. In my real case the limit is 76. These numbers are clearly dependent on data size of each row so a method is required to estimate this based on data type and number of columns. #19664

Expected #Output

runfile(‘H:/Tests/Pandas_0.23_test.py’, wdir=‘H:/Tests’) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

####Trace###########

runfile(‘H:/Tests/Pandas_0.23_test.py’, wdir=‘H:/Tests’) Traceback (most recent call last):

File “<ipython-input-2-7d10a48edaae>”, line 1, in <module> runfile(‘H:/Tests/Pandas_0.23_test.py’, wdir=‘H:/Tests’)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 705, in runfile execfile(filename, namespace)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 102, in execfile exec(compile(f.read(), filename, ‘exec’), namespace)

File “H:/Tests/Pandas_0.23_test.py”, line 19, in <module> df.to_sql(‘test’,engine, if_exists=‘fail’,chunksize=500)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py”, line 2127, in to_sql dtype=dtype)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py”, line 450, in to_sql chunksize=chunksize, dtype=dtype)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py”, line 1149, in to_sql table.insert(chunksize)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py”, line 663, in insert self._execute_insert(conn, keys, chunk_iter)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py”, line 638, in _execute_insert conn.execute(*self.insert_statement(data, conn))

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 948, in execute return meth(self, multiparams, params)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\sql\elements.py”, line 269, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 1060, in _execute_clauseelement compiled_sql, distilled_params

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 1200, in _execute_context context)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 1413, in _handle_dbapi_exception exc_info

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\util\compat.py”, line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\util\compat.py”, line 186, in reraise raise value.with_traceback(tb)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py”, line 1193, in _execute_context context)

File “C:\Users\kane.hill\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\default.py”, line 507, in do_execute cursor.execute(statement, parameters)

OperationalError: (sqlite3.OperationalError) too many SQL variables [SQL: 'INSERT INTO test (“index”, “0”) VALUES (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?,

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.23.0 pytest: 3.5.1 pip: 10.0.1 setuptools: 39.1.0 Cython: 0.28.2 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.4.0 sphinx: 1.7.4 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.5 feather: None matplotlib: 2.2.2 openpyxl: 2.5.3 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.4 lxml: 4.2.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.7 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:37 (21 by maintainers)

github_iconTop GitHub Comments

3reactions
tripkanecommented, May 17, 2018

@TomAugspurger: After further testing the initial assumption seems correct and is based on the SQLITE_MAX_VARIABLE_NUMBER limit of 999. The max allowable chunksize associated with pd.df_to_sql is given by chunksize=999//(cols+1) where cols was generated for the test by cols = np.random.randint(1,20) and used to create df = pd.DataFrame(np.random.random((20000, cols)))

2reactions
gnfraziercommented, Sep 13, 2018

Had the initial issue using 23.0 updating to 23.4 corrected the problem. Thanks for the fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandasql OperationalError: too many SQL variables
This is due to pandas 23.0 version. If you upgrade to pandas 23.4, your problem will be fixed. Use pip install --upgrade pandas...
Read more >
pandasql OperationalError: too many SQL variables-Pandas ...
This is due to pandas 23.0 version. If you upgrade to pandas 23.4, your problem will be fixed. Use pip install --upgrade pandas...
Read more >
the 2 711670 , 3 660471 . 4 510882 of 5 416115 and 6
... evolved 1836 691 variable 1837 691 architecture 1838 690 bring 1839 690 projects 1840 690 proteins 1841 690 round 1842 690 Labor...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found