question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

azureml-core 1.44.0 fails to deploy model to webservice

See original GitHub issue
  • azureml-core
  • 1.44.0
  • conda virtualenv (compute instance):
  • Azure Machine Learning
  • ML

Describe the bug Model fails to deploy when I run the deployment code in Azure Notebook using virtualenv with azureml-core 1.44.0

It works just fine with older version (1.43.0) or the default Python 3.8 - Azure ML that uses 1.42.0 at the moment.

The output:

Running
2022-08-25 07:03:20+00:00 Creating Container Registry if not exists.
2022-08-25 07:03:20+00:00 Registering the environment.
2022-08-25 07:03:21+00:00 Use the existing image.
2022-08-25 07:03:22+00:00 Generating deployment configuration.
2022-08-25 07:03:23+00:00 Submitting deployment to compute.
2022-08-25 07:03:30+00:00 Checking the status of deployment heart-disease-classification-env..
2022-08-25 07:05:44+00:00 Checking the status of inference endpoint heart-disease-classification-env.
Failed
Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 3a980ad2-890e-4e8a-91d6-c119bd0528a4
More information can be found using '.get_logs()'
Error:
{
  "code": "AciDeploymentFailed",
  "statusCode": 400,
  "message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
	1. Please check the logs for your container instance: heart-disease-classification-env. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
	2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
	3. You can also try to run image 237a7cc8f2c84e1287a6cc08d5e54f9f.azurecr.io/azureml/azureml_09e362cde9760a4b66987389c8bbc20a locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.",
  "details": [
    {
      "code": "CrashLoopBackOff",
      "message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.
	1. Please check the logs for your container instance: heart-disease-classification-env. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
	2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
	3. You can also try to run image 237a7cc8f2c84e1287a6cc08d5e54f9f.azurecr.io/azureml/azureml_09e362cde9760a4b66987389c8bbc20a locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information."
    },
    {
      "code": "AciDeploymentFailed",
      "message": "Your container application crashed. Please follow the steps to debug:
	1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.
	2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.
	3. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
	4. View the diagnostic events to check status of container, it may help you to debug the issue.
"RestartCount": 3
"CurrentState": {"state":"Waiting","startTime":null,"exitCode":null,"finishTime":null,"detailStatus":"CrashLoopBackOff: Back-off restarting failed"}
"PreviousState": {"state":"Terminated","startTime":"2022-08-25T07:07:34.619Z","exitCode":111,"finishTime":"2022-08-25T07:07:48.858Z","detailStatus":"Error"}
"Events":
{"count":1,"firstTimestamp":"2022-08-25T07:03:36Z","lastTimestamp":"2022-08-25T07:03:36Z","name":"Pulling","message":"pulling image "237a7cc8f2c84e1287a6cc08d5e54f9f.azurecr.io/azureml/azureml_09e362cde9760a4b66987389c8bbc20a@sha256:7650a3f19eb4803881637a920dc3e9bf9837c0e9c492b7d22be840d0ba8cb1cf"","type":"Normal"}
{"count":1,"firstTimestamp":"2022-08-25T07:05:15Z","lastTimestamp":"2022-08-25T07:05:15Z","name":"Pulled","message":"Successfully pulled image "237a7cc8f2c84e1287a6cc08d5e54f9f.azurecr.io/azureml/azureml_09e362cde9760a4b66987389c8bbc20a@sha256:7650a3f19eb4803881637a920dc3e9bf9837c0e9c492b7d22be840d0ba8cb1cf"","type":"Normal"}
{"count":4,"firstTimestamp":"2022-08-25T07:05:37Z","lastTimestamp":"2022-08-25T07:07:34Z","name":"Started","message":"Started container","type":"Normal"}
{"count":4,"firstTimestamp":"2022-08-25T07:05:54Z","lastTimestamp":"2022-08-25T07:07:48Z","name":"Killing","message":"Killing container with id 54971cd5cf0e6de46f30bd592bea94752d4ad857fb32f6d85e33b3a8bd4e4c92.","type":"Normal"}
"
    }
  ]
}

To Reproduce Steps to reproduce the behavior:

  1. I use the standard heart-diseaase dataset, train the model and export it to model/hd_otr.pkl
  2. In assets folder I store the outlierremover.py script that I use to remove outliers:
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin

class OutlierRemover(BaseEstimator, TransformerMixin):
    def __init__(self, factor=1.5):
        self.factor = factor
        
    def outlier_detector(self, X, y=None):
        X = pd.Series(X).copy()
        q1 = X.quantile(0.25)
        q3 = X.quantile(0.75)
        iqr = q3 - q1
        self.lower_bound.append(q1 - (self.factor * iqr))
        self.upper_bound.append(q3 + (self.factor * iqr))

    def fit(self,X,y=None):
        self.lower_bound = []
        self.upper_bound = []
        X.apply(self.outlier_detector)
        return self
    
    def transform(self, X, y=None):
        X = pd.DataFrame(X).copy()
        for i in range(X.shape[1]):
            x = X.iloc[:, i].copy()
            x[(x < self.lower_bound[i])] = self.lower_bound[i]
            x[(x > self.upper_bound[i])] = self.upper_bound[i]
            X.iloc[:, i] = x
        return X
    
outlier_remover = OutlierRemover()

and score.py file:

import joblib
from azureml.core.model import Model
import json
import pandas as pd
import numpy as np
from outlierremover import OutlierRemover

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType
from inference_schema.parameter_types.standard_py_parameter_type import StandardPythonParameterType

def init():
    global model
    # Example when the model is a file
    model_path = Model.get_model_path('hd_otr') # logistic
    print('Model Path is  ', model_path)
    model = joblib.load(model_path)
    
data_sample = PandasParameterType(pd.DataFrame({'age': pd.Series([71], dtype='int64'),
                                                'sex': pd.Series(['0'], dtype='object'),
                                                'cp': pd.Series(['0'], dtype='object'),
                                                'trestbps': pd.Series([112], dtype='int64'),
                                                'chol': pd.Series([203], dtype='int64'),
                                                'fbs': pd.Series(['0'], dtype='object'),
                                                'restecg': pd.Series(['1'], dtype='object'),
                                                'thalach': pd.Series([185], dtype='int64'),
                                                'exang': pd.Series(['0'], dtype='object'),
                                                'oldpeak': pd.Series([0.1], dtype='float64'),
                                                'slope': pd.Series(['2'], dtype='object'),
                                                'ca': pd.Series(['0'], dtype='object'),
                                                'thal': pd.Series(['2'], dtype='object')}))

input_sample = StandardPythonParameterType({'data': data_sample})
result_sample = NumpyParameterType(np.array([0]))
output_sample = StandardPythonParameterType({'Results': result_sample})

@input_schema('Inputs', input_sample)
@output_schema(output_sample)
def run(Inputs):
    try:
        data = Inputs['data']
        #result = model.predict_proba(data)
        result = np.round(model.predict_proba(data)[0][0], 2)
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error
  1. In the deployment.ipynb notebook the code is as follows:
from azureml.core import Workspace
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.conda_dependencies import CondaDependencies

ws = Workspace.from_config()

model = Model.register(workspace = ws,
              model_path ='model/hd_otr.pkl',
              model_name = 'hd_otr',
              tags = {'version': '1'},
              description = 'Heart disease classification with outliers detection',
              )

# to install required packages
env = Environment('env')
cd = CondaDependencies.create(pip_packages=['pandas', 'azureml-defaults', 'joblib', 'inference-schema', 'imbalanced-learn'], conda_packages = ['scikit-learn'])
env.python.conda_dependencies = cd

# register environment to re-use later
env.register(workspace = ws)

myenv = Environment.get(workspace=ws, name='env')

myenv.save_to_directory('./environ', overwrite=True)

aciconfig = AciWebservice.deploy_configuration(
            cpu_cores=1,
            memory_gb=1,
            tags={'data':'heart disease classifier'},
            description='Classification of heart diseases'
            )

inference_config = InferenceConfig(entry_script='score.py', environment=myenv, source_directory='./assets')

service = Model.deploy(workspace=ws,
                name='heart-disease-classification-env',
                models=[model],
                inference_config=inference_config,
                deployment_config=aciconfig, 
                overwrite=True)

service.wait_for_deployment(show_output=True)
url = service.scoring_uri
print(url)

…which gives the error from 1. with 1.44.0 but works just fine with the older versions.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:14 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Antytezacommented, Sep 16, 2022

After updating to azure-ml 1.45.0 it works just fine; the logs below:

/bin/bash: /azureml-envs/azureml_e1e061720cbfb37570430eb013110205/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_e1e061720cbfb37570430eb013110205/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_e1e061720cbfb37570430eb013110205/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_e1e061720cbfb37570430eb013110205/lib/libtinfo.so.6: no version information available (required by /bin/bash)
2022-09-16T15:12:08,338625698+00:00 - iot-server/run 
2022-09-16T15:12:08,330644171+00:00 - rsyslog/run 
2022-09-16T15:12:08,351312541+00:00 - gunicorn/run 
2022-09-16T15:12:08,352860046+00:00 | gunicorn/run | 
bash: /azureml-envs/azureml_e1e061720cbfb37570430eb013110205/lib/libtinfo.so.6: no version information available (required by bash)
2022-09-16T15:12:08,358483765+00:00 | gunicorn/run | ###############################################
2022-09-16T15:12:08,365156888+00:00 - nginx/run 
2022-09-16T15:12:08,366943294+00:00 | gunicorn/run | AzureML Container Runtime Information
2022-09-16T15:12:08,375958525+00:00 | gunicorn/run | ###############################################
2022-09-16T15:12:08,382303046+00:00 | gunicorn/run | 
2022-09-16T15:12:08,391929879+00:00 | gunicorn/run | 
2022-09-16T15:12:08,417261065+00:00 | gunicorn/run | AzureML image information: openmpi4.1.0-ubuntu20.04, Materializaton Build:20220729.v6
2022-09-16T15:12:08,419892174+00:00 | gunicorn/run | 
2022-09-16T15:12:08,425075692+00:00 | gunicorn/run | 
2022-09-16T15:12:08,427181899+00:00 | gunicorn/run | PATH environment variable: /azureml-envs/azureml_e1e061720cbfb37570430eb013110205/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2022-09-16T15:12:08,435374927+00:00 | gunicorn/run | PYTHONPATH environment variable: 
2022-09-16T15:12:08,440198143+00:00 | gunicorn/run | 
2022-09-16T15:12:08,443152553+00:00 | gunicorn/run | Pip Dependencies (before dynamic installation)

EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
/bin/bash: /azureml-envs/azureml_e1e061720cbfb37570430eb013110205/lib/libtinfo.so.6: no version information available (required by /bin/bash)
2022-09-16T15:12:08,695025809+00:00 - iot-server/finish 1 0
2022-09-16T15:12:08,702663935+00:00 - Exit code 1 is normal. Not restarting iot-server.
adal==1.2.7
argcomplete==2.0.0
attrs==22.1.0
azure-common==1.1.28
azure-core==1.25.1
azure-graphrbac==0.61.1
azure-identity==1.7.0
azure-mgmt-authorization==2.0.0
azure-mgmt-containerregistry==10.0.0
azure-mgmt-core==1.3.2
azure-mgmt-keyvault==10.1.0
azure-mgmt-resource==21.1.0
azure-mgmt-storage==20.0.0
azureml-core==1.45.0.post1
azureml-dataprep==4.2.2
azureml-dataprep-native==38.0.0
azureml-dataprep-rslex==2.8.1
azureml-dataset-runtime==1.45.0
azureml-defaults==1.45.0
azureml-inference-server-http==0.7.6
backports.tempfile==1.0
backports.weakref==1.0.post1
bcrypt==4.0.0
cachetools==5.2.0
certifi @ file:///opt/conda/conda-bld/certifi_1655968806487/work/certifi
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
cloudpickle==2.2.0
configparser==3.7.4
contextlib2==21.6.0
cryptography==37.0.4
distro==1.7.0
docker==5.0.3
dotnetcore2==3.1.23
Flask==2.1.3
Flask-Cors==3.0.10
fusepy==3.0.1
google-api-core==2.10.1
google-auth==2.11.0
googleapis-common-protos==1.56.4
gunicorn==20.1.0
humanfriendly==10.0
idna==3.4
imbalanced-learn==0.9.1
importlib-metadata==4.12.0
importlib-resources==5.9.0
inference-schema==1.4.2.1
isodate==0.6.1
itsdangerous==2.1.2
jeepney==0.8.0
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
json-logging-py==0.2
jsonpickle==2.2.0
jsonschema==4.16.0
knack==0.9.0
MarkupSafe==2.1.1
mkl-fft==1.3.1
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626186064646/work
mkl-service==2.4.0
msal==1.18.0
msal-extensions==0.3.1
msrest==0.7.1
msrestazure==0.6.4
ndg-httpsclient==0.5.1
numpy==1.22.4
oauthlib==3.2.1
opencensus==0.11.0
opencensus-context==0.1.3
opencensus-ext-azure==1.1.7
packaging==21.3
pandas==1.4.4
paramiko==2.11.0
pathspec==0.10.1
pkginfo==1.8.3
pkgutil_resolve_name==1.3.10
portalocker==2.5.1
protobuf==4.21.6
psutil==5.9.2
pyarrow==6.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
Pygments==2.13.0
PyJWT==2.4.0
PyNaCl==1.5.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyrsistent==0.18.1
PySocks==1.7.1
python-dateutil==2.8.2
pytz==2022.2.1
PyYAML==6.0
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
scikit-learn @ file:///tmp/abs_d76175bc-917a-47d4-9994-b56265948a6328vmoe2o/croots/recipe/scikit-learn_1658419412415/work
scipy @ file:///tmp/build/80754af9/scipy_1641555001653/work
SecretStorage==3.3.3
six @ file:///tmp/build/80754af9/six_1644875935023/work
tabulate==0.8.10
threadpoolctl @ file:///Users/ktietz/demo/mc3/conda-bld/threadpoolctl_1629802263681/work
typing_extensions==4.3.0
urllib3==1.26.12
websocket-client==1.4.1
Werkzeug==2.2.2
wrapt==1.12.1
zipp==3.8.1

2022-09-16T15:12:10,190193792+00:00 | gunicorn/run | 
2022-09-16T15:12:10,192577700+00:00 | gunicorn/run | ###############################################
2022-09-16T15:12:10,196517413+00:00 | gunicorn/run | AzureML Inference Server
2022-09-16T15:12:10,198003918+00:00 | gunicorn/run | ###############################################
2022-09-16T15:12:10,200434027+00:00 | gunicorn/run | 
2022-09-16T15:12:12,552373322+00:00 | gunicorn/run | Starting AzureML Inference Server HTTP.

Azure ML Inferencing HTTP server v0.7.6


Server Settings
---------------
Entry Script Name: /var/azureml-app/assets/score.py
Model Directory: /var/azureml-app/azureml-models/hd_otr/33
Worker Count: 1
Worker Timeout (seconds): 300
Server Port: 31311
Application Insights Enabled: false
Application Insights Key: None
Inferencing HTTP server version: azmlinfsrv/0.7.6
CORS for the specified origins: None


Server Routes
---------------
Liveness Probe: GET   127.0.0.1:31311/
Score:          POST  127.0.0.1:31311/score

Starting gunicorn 20.1.0
Listening at: http://0.0.0.0:31311/ (78)
Using worker: sync
Booting worker with pid: 142
Initializing logger
2022-09-16 15:12:14,079 | root | INFO | Starting up app insights client
logging socket was found. logging is available.
logging socket was found. logging is available.
2022-09-16 15:12:14,081 | root | INFO | Starting up app insight hooks
2022-09-16 15:12:17,818 | root | INFO | Found user script at /var/azureml-app/assets/score.py
2022-09-16 15:12:17,819 | root | INFO | run() is decorated with @input_schema. Server will invoke it with the following arguments: Inputs.
2022-09-16 15:12:17,820 | root | INFO | Invoking user's init function
00000000-0000-0000-0000-000000000000,Model Path is
00000000-0000-0000-0000-000000000000,/var/azureml-app/azureml-models/hd_otr/33/hd_otr.pkl
2022-09-16 15:12:17,989 | root | INFO | Users's init has completed successfully
2022-09-16 15:12:17,994 | root | INFO | Swaggers are prepared for the following versions: [2, 3].
2022-09-16 15:12:17,994 | root | INFO | Scoring timeout is found from os.environ: 60000 ms
2022-09-16 15:12:17,996 | root | INFO | AML_FLASK_ONE_COMPATIBILITY is set. Patched Flask to ensure compatibility with Flask 1.
2022-09-16 15:16:55,710 | root | INFO | 200
127.0.0.1 - - [16/Sep/2022:15:16:55 +0000] "GET /swagger.json HTTP/1.0" 200 3440 "-" "Go-http-client/1.1"
2022-09-16 15:16:58,745 | root | INFO | 200
127.0.0.1 - - [16/Sep/2022:15:16:58 +0000] "GET /swagger.json HTTP/1.0" 200 3440 "-" "Go-http-client/1.1"

Thank you!

0reactions
ysmucommented, Sep 14, 2022

We’ve published 0.7.6 to address this issue. Please let us know if you continue to experience this issue in the latest version. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

azureml-core 1.44.0 fails to deploy model to webservice
1. From the AML SDK, you can run print(service. · 2. If your container application crashed. · 3. You can also interactively debug...
Read more >
azureml-core
Creating/managing Machine learning compute targets and resources. Models, images and web services. Modules supporting data representation for Datastore and ...
Read more >
Issue with Deploying a Model using Azure Machine ...
Issue with Deploying a Model using Azure Machine Learning Service using notebook ... azureml-train-automl-runtime==1.44.0; inference-schema ...
Read more >
How to fix Azure ml model deployment Error
I'm trying to deploy a RandomForest model using azure ML with ACI , but after i deploy my service i keep getting this...
Read more >
What is Azure Machine Learning
And much more... Visit Azure Machine Learning studio at ml.azure.com. When you have the right model, you can easily use it in a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found