Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PyArrow Requirement Exceeds Lambda Unzipped Max

See original GitHub issue

The PandasCursor is no longer usable in AWS Lambdas because of the requirement to use PyArrow. The library is quite large, and it exceeds AWS Unzipped file size limit. It is prevent deploys at the moment.

PyArrow is quite large, and other libraries have run into similar issues: https://github.com/snowflakedb/snowflake-connector-python/issues/213

I wonder if could be made an optional dependency for the PandasCursor? The library could use the older logic as fallback for when it isn’t present.

Issue Analytics

State:
Created a year ago
Comments:12 (5 by maintainers)

Top GitHub Comments

1reaction

laughingman7743commented, Jun 29, 2022

The unload option can be easily configured and used as described in the README as follows.

from pyathena import connect
from pyathena.pandas.cursor import PandasCursor
cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                    region_name="us-west-2",
                    cursor_class=PandasCursor,
                    cursor_kwargs={
                        "unload": True
                    }).cursor()

from pyathena import connect
from pyathena.pandas.cursor import PandasCursor
cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                    region_name="us-west-2",
                    cursor_class=PandasCursor).cursor(unload=True)

The query execution itself is very fast when using the unload option, and the retrieval of results is also super fast. Please give it a try.

I will remove PyArrow from the required dependencies so that you can choose PyArrow or FastParquet. Until I release a supported version, please use versions earlier than 2.9.0.

0reactions

aaronclongcommented, Jul 20, 2022

I do not believe that introducing such an interface will solve this problem, nor do I intend to. I think the best solution is to make sure that the boto3 and botocore versions specify dependencies higher than the version that supports the resolve_checksum_context.

@laughingman7743 currently there is only one version of botocore that you will have to force pin. So I don’t think your suggested solution will work.