[Feature Request] Direct SageMaker support?
See original GitHub issueWhat is the problem this feature will solve?
A lot of individuals and companies use SageMaker for model training and deployment, but often they are not experts in taking repositories like this one and understanding how to wrap them with SageMaker. So instead, they often default to examples they can find that are already integrated with SageMaker. However in the object detection space, these examples can often be much less capable than MMDetection.
What is the feature you are proposing to solve the problem?
Creating a tools/train_sagemaker.py
and an example for training.
What alternatives have you considered?
Right now I have a train_sagemaker.py
script that launches training by executingsubprocess.Popen
with a command that uses torchrun
to launch tools/train.py
. For example:
# Train script config
launch_config = ["torchrun",
"--nnodes", str(world['number_of_machines']), "--node_rank", str(world['machine_rank']),
"--nproc_per_node", str(world['number_of_processes']), "--master_addr", world['master_addr'],
"--master_port", world['master_port']]
train_config = [os.path.join(os.environ["MMDETECTION"], "tools/train.py"),
config_file,
"--launcher", "pytorch",
"--work-dir", '/opt/ml/checkpoints']
if not args.validate:
train_config.append("--no-validate")
# Concat Pytorch Distributed Launch config and MMdetection config
joint_cmd = " ".join(str(x) for x in launch_config+train_config)
print("Following command will be executed: \n", joint_cmd)
process = subprocess.Popen(joint_cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE, shell=True)
while True:
output = process.stdout.readline()
if process.poll() is not None:
break
if output:
print(output.decode("utf-8").strip())
rc = process.poll()
if process.returncode != 0:
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=joint_cmd)
But maybe there’s a better way to accomplish this and integrate it more directly?
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Create, Store, and Share Features with Amazon SageMaker ...
The offline store can help you store and serve features for exploration and model training. The online store retains only the latest feature...
Read more >Ingesting Historical Feature Data into SageMaker Feature Store
In this blog post I show how to write historical feature data directly into S3, which is the backbone of the SMFS offline...
Read more >Amazon SageMaker Feature Store Deep Dive Demo - YouTube
In this demo video, you'll learn how Amazon SageMaker Feature Store helps to store, update, retrieve, and share machine learning (ML) ...
Read more >aws/amazon-sagemaker-examples - GitHub
GitHub - aws/amazon-sagemaker-examples: Example Jupyter notebooks that ... to directly deploy the best model to an endpoint to serve inference requests.
Read more >What is Amazon SageMaker? - TechTarget
During this step, data is transformed to enable feature engineering. Deploy and analyze. When the model is ready for deployment, the service automatically ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
No problem, thanks for working on this, let me know if you need any help!
Hi @austinmw , Thanks for your sharing. The code is definitely helpful to us. We will deeply check the code and may have a design in the following month. It might take several weeks as we do not have AWS services for now and we already have plans in this and the next month.