Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issues with mlflow sagemaker deploy

See original GitHub issue

System information

Code to reproduce issue

Describe the problem

I want to deploy a mlflow spark app to sagemaker. Is this possible? As I sucessfully push a image to ECR with

mlflow sagemaker build-and-push-container

and then when I attempt to deploy this image with

‘mlflow sagemaker deploy…’

It fails due to a time out when trying to create the endpoint. When I log at the in cloud watch multiple errors appear, one is for example: py4j.protocol.Py4JJavaError: An error occurred while calling o75.load. or

2022/07/15 13:41:44 [error] 453#453: *21 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.32.0.2, server: , request: "GET /ping HTTP/1.1", upstream: "http://127.0.0.1:8000/ping", host: "model.aws.local:8080" or

gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3> or CondaValueError: prefix already exists: /miniconda/envs/custom_env

I use libraries like:

from pyspark.sql import SparkSession

import pyspark.sql.types as T 
import pyspark.sql.functions as F
from pyparsing import col
from pyspark.sql.window import Window
from pyspark.ml.feature import Bucketizer
from pyspark.ml.evaluation import RegressionEvaluator
from pandas import DataFrame

import mlflow
import mlflow.spark

I use versions java - 18.0.333 spark 3.1.2 hadoop 3.2 And all other necessary pip installations. Any help would be nice

Other info / logs

_No response log-events-viewer-result (1).csv _

Issue Analytics

State:
Created a year ago
Comments:13 (7 by maintainers)

Top GitHub Comments

1reaction

hollytbcommented, Jul 18, 2022

@harupy when I check if the model is serving locally is fails with the error: 2022/07/18 11:04:27 INFO mlflow.sagemaker: executing: docker run -v C:\Users\....\mlruns\5\0a92547...\artifacts\model:/opt/ml/model/ -p 5000:8080 -e MLFLOW_DEPLOYMENT_FLAVOR_NAME=python_function -e SERVING_ENVIRONMENT=SageMaker --rm mlflow-pyfunc serve docker: Error response from daemon: driver failed programming external connectivity on endpoint epic_kapitsa (8d05038a9156c5aba360d24c3756d249171b95bfd0f4cd9ff8c2168d6b4d3f7a): Bind for 0.0.0.0:5000 failed: port is already allocated. Would this be causing the deploy stage to fail?

Never mind it’s attempting to run now, I had to stop the mlflow ui running in the background. Will update you with the output

1reaction

harupycommented, Jul 18, 2022

@hollytb You can also try running this command to check if model serving works locally:

# Run the script you attached to log a model first, then run
mlflow sagemaker run-local -m runs:/<run_id>/model

Top Results From Across the Web

mlflow.deployments - Documentation

Exposes functionality for deploying MLflow models to custom serving tools. Note: model deployment to AWS Sagemaker can currently be performed via the mlflow....

From Dev to Deployment: An End to End Sentiment Classifier ...

In this tutorial, we'll build an NLP app starting from DagsHub-MLflow, then diving into deployment in SageMaker and EC2 with the front end ......

Issues with deploying spark and mlflow to sagemaker

mlflow sagemaker deploy .. I've successfully pushed a image to EC2 with mlflow sagemaker build-and-push-container. I encounter errors when ...

mlflow-sagemaker - PyPI

MLflow : An ML Workflow Tool (Forked for Sagemaker) ... A model packaging format and tools that let you easily deploy the same...

Managing your machine learning lifecycle with MLflow and ...

SageMaker is a fully managed service that provides developers and data scientists the ability to build, train, and deploy ML models quickly.