question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

overwrite boto3 version for AWS Glue Python Shell jobs

See original GitHub issue

I need to use a newer boto3 package for an AWS Glue Python3 shell job (Glue Version: 1.0). I included the wheel file downloaded from: https://pypi.org/project/boto3/1.13.21/#files: boto3-1.13.21-py2.py3-none-any.whl under Python Library Path. However, boto3.__version__ prints out 1.9.203 even if the job log console says boto3==1.13.21 was successfully installed. For some reason, Glue Python3 Shell job (Glue Version: 1.0) is not letting me overwrite the boto3 package version with the wheel file. Is there any way to overwrite?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:22 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
sarath-meccommented, Nov 27, 2020

Hi,

We got AWS Glue Python Shell working with all dependency as follows. The Glue has awscli dependency as well along with boto3

AWS Glue Python Shell with Internet

Add awscli and boto3 whl files to Python library path during Glue Job execution. This option is slow as it has to download and install dependencies.

  1. Download the following whl files from boto3 files and awscli files
  1. Upload the files to s3 bucket in your given python library path

  2. Add the s3 whl file paths in the Python library path. Give the entire whl file s3 referenced path separated by comma

  3. Add the following code snippet to load the new files


#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344
import sys
sys.path.insert(0, '/glue/lib/installation')
keys = [k for k in sys.modules.keys() if 'boto' in k]
for k in keys:
    if 'boto' in k:
       del sys.modules[k]

import boto3
print('boto3 version')
print(boto3.__version__)

AWS Glue Python Shell without Internet connectivity

Reference : AWS Wrangler Glue dependency build

  1. We followed the steps mentioned above for awscli and boto3 whl files.

  2. Below is the latest requirements.txt compiled for the newest versions

colorama==0.4.3
docutils==0.15.2
rsa==4.5.0
s3transfer==0.3.3
PyYAML==5.3.1
botocore==1.19.23
pyasn1==0.4.8
jmespath==0.10.0
urllib3==1.26.2
python_dateutil==2.8.1
six==1.15.0
  1. Download the dependencies to libs folder
pip download -r requirements.txt -d libs
  1. Move the original main whl files also to the lib directory
  1. Package as a zip file
cd libs
zip ../boto3-depends.zip *
  1. Upload the boto3-depends.zip to s3 and add the path to Glue jobs Referenced files path Note: It is Referenced files path and not Python library path

  2. Placeholder code to install latest awcli and boto3 and load into AWS Python Glue Shell. Additional code as per below thread

https://forums.aws.amazon.com/thread.jspa?messageID=954344

import os.path
import subprocess
import sys

# borrowed from https://stackoverflow.com/questions/48596627/how-to-import-referenced-files-in-etl-scripts
def get_referenced_filepath(file_name, matchFunc=os.path.isfile):
    for dir_name in sys.path:
        candidate = os.path.join(dir_name, file_name)
        if matchFunc(candidate):
            return candidate
    raise Exception("Can't find file: ".format(file_name))

zip_file = get_referenced_filepath("awswrangler-depends.zip")

subprocess.run(["unzip", zip_file])

# Can't install --user, or without "-t ." because of permissions issues on the filesystem
subprocess.run(["pip3 install --no-index --find-links=. -t . *.whl"], shell=True)

#Additonal code as part of AWS Thread https://forums.aws.amazon.com/thread.jspa?messageID=954344
sys.path.insert(0, '/glue/lib/installation')
keys = [k for k in sys.modules.keys() if 'boto' in k]
for k in keys:
    if 'boto' in k:
       del sys.modules[k]

import boto3
print('boto3 version')
print(boto3.__version__)
  1. Check if the code is working with latest AWS CLI API

Thanks, Sarath

0reactions
jonathan260589commented, Jan 4, 2022

@gbeaven90 Hi, I don’t know if you are still facing this issue. I was facing the exact same one and I managed to solve it by also installing the latest version of the AWS CLI. The one indicated by @sarath-mec (version 1.18) was already not recent enough.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Update boto3 version - AWS re:Post
I need to use a newer boto3 package for AWS Glue Python3 shell job (Glue Version: 1.0). I included the a wheel file...
Read more >
How to run an AWS Glue Python Spark Job with the Current ...
Download the boto3-1.17. · Place it in S3 location · Go back to the Glue Job and under the Security configuration, script libraries,...
Read more >
Python shell jobs in AWS Glue
Define the job properties for Python shell jobs in AWS Glue, and create files that contain your own Python libraries.
Read more >
Glue — Boto3 Docs 1.26.32 documentation - AWS
A low-level client representing AWS Glue ... import boto3 client = boto3.client('glue') ... The Python version being used to run a Python shell...
Read more >
Boto3 version for glue pyspark job : r/aws - Reddit
You can try including the updated version of boto3 in the custom python library path for your glue job. I'm not sure how...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found