OverflowError: signed integer is greater than maximum on large Pandas Queries
See original GitHub issueHello — Been using the PandasCursor on Python 3.7 and Pandas 1.0.3. Upgraded to Python 3.8.2 and PyAthena 1.10.5 today and some of my queries began to fail. It appears to be the ones that are on the largest data sets (38M records; 2GB of data in S3).
Here’s the call I’m making:
results = cursor.execute('SELECT col1, col2, col3 FROM table')
Where all three columns are string
types in Glue/Athena.
This is the error that comes back:
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-18-08c66998c34b> in <module>
----> 1 results = cursor.execute('SELECT col1, col2, col3 FROM table')
/usr/local/lib/python3.8/site-packages/pyathena/util.py in _wrapper(*args, **kwargs)
240 def _wrapper(*args, **kwargs):
241 with _lock:
--> 242 return wrapped(*args, **kwargs)
243 return _wrapper
244
/usr/local/lib/python3.8/site-packages/pyathena/pandas_cursor.py in execute(self, operation, parameters, work_group, s3_staging_dir, cache_size)
53 query_execution = self._poll(self._query_id)
54 if query_execution.state == AthenaQueryExecution.STATE_SUCCEEDED:
---> 55 self._result_set = AthenaPandasResultSet(
56 self._connection, self._converter, query_execution, self.arraysize,
57 self._retry_config)
/usr/local/lib/python3.8/site-packages/pyathena/result_set.py in __init__(self, connection, converter, query_execution, arraysize, retry_config)
358 if self.state == AthenaQueryExecution.STATE_SUCCEEDED and \
359 self.output_location.endswith(('.csv', '.txt')):
--> 360 self._df = self._as_pandas()
361 else:
362 import pandas as pd
/usr/local/lib/python3.8/site-packages/pyathena/result_set.py in _as_pandas(self)
449 header = 0
450 names = None
--> 451 df = pd.read_csv(io.BytesIO(response['Body'].read()),
452 sep=sep,
453 header=header,
/usr/local/lib/python3.8/site-packages/botocore/response.py in read(self, amt)
76 """
77 try:
---> 78 chunk = self._raw_stream.read(amt)
79 except URLLib3ReadTimeoutError as e:
80 # TODO: the url will be None as urllib3 isn't setting it yet
/usr/local/lib/python3.8/site-packages/urllib3/response.py in read(self, amt, decode_content, cache_content)
513 if amt is None:
514 # cStringIO doesn't like amt=None
--> 515 data = self._fp.read() if not fp_closed else b""
516 flush_decoder = True
517 else:
/usr/local/Cellar/python@3.8/3.8.2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py in read(self, amt)
465 else:
466 try:
--> 467 s = self._safe_read(self.length)
468 except IncompleteRead:
469 self._close_conn()
/usr/local/Cellar/python@3.8/3.8.2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py in _safe_read(self, amt)
606 IncompleteRead exception can be used to detect the problem.
607 """
--> 608 data = self.fp.read(amt)
609 if len(data) < amt:
610 raise IncompleteRead(data, amt-len(data))
/usr/local/Cellar/python@3.8/3.8.2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py in readinto(self, b)
667 while True:
668 try:
--> 669 return self._sock.recv_into(b)
670 except timeout:
671 self._timeout_occurred = True
/usr/local/Cellar/python@3.8/3.8.2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py in recv_into(self, buffer, nbytes, flags)
1239 "non-zero flags not allowed in calls to recv_into() on %s" %
1240 self.__class__)
-> 1241 return self.read(nbytes, buffer)
1242 else:
1243 return super().recv_into(buffer, nbytes, flags)
/usr/local/Cellar/python@3.8/3.8.2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py in read(self, len, buffer)
1097 try:
1098 if buffer is not None:
-> 1099 return self._sslobj.read(len, buffer)
1100 else:
1101 return self._sslobj.read(len)
OverflowError: signed integer is greater than maximum
This is my first time reporting something like this, so please let me know what other information (and how to gather it if it isn’t obvious) you would need to diagnose the issue. Thank you!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Overflowerror when reading from s3 - signed integer is greater ...
If I'm reading from S3 and trying to load the file into Lambda's memory (which is sufficiently large enough to hold the data),...
Read more >`OverflowError: signed integer is greater than maximum` in ssl ...
When attempting to read a large file (> 2GB) over HTTPS the read fails with "OverflowError: signed integer is greater than maximum".
Read more >OverflowError: signed integer is greater than maximum
OverflowError : signed integer is greater than maximum. Believe it should be problem with the 64 bits OS. Tried python 2.6.1 and 2.6.6...
Read more >Pandas dtype issue: converting number to str - signed integer ...
df_web = pd.read_csv('web_oh.csv',dtype=str) traceback: OverflowError: signed integer is greater than maximum. example data:
Read more >12372 (OverflowError: signed integer is greater than maximum)
Branch: Release Notes: Fix parse_date() raising OverflowError for large integer part. API Changes:.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I have recently encountered this behaviour, and I think that there’s a simple workaround in
result_set.py
which makes pandas read the response in chunks rather than all in one go:Python 3.8.2 (64 bit) on Linux, PyAthena==1.10.5, pandas==1.0.3
ty!