question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issues with mlflow sagemaker deploy

See original GitHub issue

System information

j

Code to reproduce issue

bb

Describe the problem

I want to deploy a mlflow spark app to sagemaker. Is this possible? As I sucessfully push a image to ECR with

mlflow sagemaker build-and-push-container

and then when I attempt to deploy this image with

‘mlflow sagemaker deploy…’

It fails due to a time out when trying to create the endpoint. When I log at the in cloud watch multiple errors appear, one is for example: py4j.protocol.Py4JJavaError: An error occurred while calling o75.load. or

2022/07/15 13:41:44 [error] 453#453: *21 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.32.0.2, server: , request: "GET /ping HTTP/1.1", upstream: "http://127.0.0.1:8000/ping", host: "model.aws.local:8080" or

gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3> or CondaValueError: prefix already exists: /miniconda/envs/custom_env

I use libraries like:

from pyspark.sql import SparkSession

import pyspark.sql.types as T 
import pyspark.sql.functions as F
from pyparsing import col
from pyspark.sql.window import Window
from pyspark.ml.feature import Bucketizer
from pyspark.ml.evaluation import RegressionEvaluator
from pandas import DataFrame

import mlflow
import mlflow.spark

I use versions java - 18.0.333 spark 3.1.2 hadoop 3.2 And all other necessary pip installations. Any help would be nice

Other info / logs

_No response log-events-viewer-result (1).csv _

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
hollytbcommented, Jul 18, 2022

@harupy when I check if the model is serving locally is fails with the error: 2022/07/18 11:04:27 INFO mlflow.sagemaker: executing: docker run -v C:\Users\....\mlruns\5\0a92547...\artifacts\model:/opt/ml/model/ -p 5000:8080 -e MLFLOW_DEPLOYMENT_FLAVOR_NAME=python_function -e SERVING_ENVIRONMENT=SageMaker --rm mlflow-pyfunc serve docker: Error response from daemon: driver failed programming external connectivity on endpoint epic_kapitsa (8d05038a9156c5aba360d24c3756d249171b95bfd0f4cd9ff8c2168d6b4d3f7a): Bind for 0.0.0.0:5000 failed: port is already allocated. Would this be causing the deploy stage to fail?

Never mind it’s attempting to run now, I had to stop the mlflow ui running in the background. Will update you with the output

1reaction
harupycommented, Jul 18, 2022

@hollytb You can also try running this command to check if model serving works locally:

# Run the script you attached to log a model first, then run
mlflow sagemaker run-local -m runs:/<run_id>/model
Read more comments on GitHub >

github_iconTop Results From Across the Web

mlflow.deployments - Documentation
Exposes functionality for deploying MLflow models to custom serving tools. Note: model deployment to AWS Sagemaker can currently be performed via the mlflow....
Read more >
From Dev to Deployment: An End to End Sentiment Classifier ...
In this tutorial, we'll build an NLP app starting from DagsHub-MLflow, then diving into deployment in SageMaker and EC2 with the front end ......
Read more >
Issues with deploying spark and mlflow to sagemaker
mlflow sagemaker deploy .. I've successfully pushed a image to EC2 with mlflow sagemaker build-and-push-container. I encounter errors when ...
Read more >
mlflow-sagemaker - PyPI
MLflow : An ML Workflow Tool (Forked for Sagemaker) ... A model packaging format and tools that let you easily deploy the same...
Read more >
Managing your machine learning lifecycle with MLflow and ...
SageMaker is a fully managed service that provides developers and data scientists the ability to build, train, and deploy ML models quickly.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found