question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PyArrow Requirement Exceeds Lambda Unzipped Max

See original GitHub issue

The PandasCursor is no longer usable in AWS Lambdas because of the requirement to use PyArrow. The library is quite large, and it exceeds AWS Unzipped file size limit. It is prevent deploys at the moment.

PyArrow is quite large, and other libraries have run into similar issues: https://github.com/snowflakedb/snowflake-connector-python/issues/213

I wonder if could be made an optional dependency for the PandasCursor? The library could use the older logic as fallback for when it isn’t present.

image

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
laughingman7743commented, Jun 29, 2022

The unload option can be easily configured and used as described in the README as follows.

from pyathena import connect
from pyathena.pandas.cursor import PandasCursor
cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                    region_name="us-west-2",
                    cursor_class=PandasCursor,
                    cursor_kwargs={
                        "unload": True
                    }).cursor()
from pyathena import connect
from pyathena.pandas.cursor import PandasCursor
cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                    region_name="us-west-2",
                    cursor_class=PandasCursor).cursor(unload=True)

The query execution itself is very fast when using the unload option, and the retrieval of results is also super fast. Please give it a try.

I will remove PyArrow from the required dependencies so that you can choose PyArrow or FastParquet. Until I release a supported version, please use versions earlier than 2.9.0.

0reactions
aaronclongcommented, Jul 20, 2022

I do not believe that introducing such an interface will solve this problem, nor do I intend to. I think the best solution is to make sure that the boto3 and botocore versions specify dependencies higher than the version that supports the resolve_checksum_context.

@laughingman7743 currently there is only one version of botocore that you will have to force pin. So I don’t think your suggested solution will work.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Current version too big for AWS Lambda - make pyarrow and ...
We have an app running a 1.7 version of the connector and the package size was about 80 MB, now with things like...
Read more >
Why is there a size difference when using the AWS Lambda ...
the deployment package size (unzipped) needs to be <250 MB (https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html).
Read more >
3 Ways to Overcome AWS Lambda Deployment Size Limit
The zipped size of the entire repo is around 117MB and unzipped size is around 300MB. Directory Structure and respective filesize. as barebone ......
Read more >
AWS Lambda: comparing Golang and Python | Blog post
For Python no pure-Python parquet implementation exists. A Lambda deployment with pyarrow (0.15.1) and pandas currently exceeds the limits of a ...
Read more >
Create an AWS Lambda Layer for Python Runtime
So, you are a Python Developer and excited to try AWS Lambda. ... let's say pandas for data manipulation or pyarrow for transforming...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found