Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] A problem with type checking for string objects (MLflow-deployed model in SageMaker)

See original GitHub issue

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
No. I cannot contribute a bug fix at this time.

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ‘Linux’, ‘4.14.252-131.483.amzn1.x86_64’
MLflow installed from (source or binary): pip
MLflow version (run mlflow --version): mlflow==1.22.0
Python version: python=3.6.10
npm version, if running the dev UI: -
Exact command to reproduce: -

Describe the problem

I deployed a Huggingface Transformer model in SageMaker using MLflow’s sagemaker.deploy().

The model had been tested after training (using the same test example that was used in the code that led to the described bug).

When logging the model I used infer_signature(np.array(test_example), loaded_model.predict(test_example)) to infer input and output signatures.

Model is deployed successfully. When trying to query the model I get ModelError (full traceback below).

To query the model, I am using precisely the same test_example that I used for infer_signature():

test_example = [['This is the subject', 'This is the body']]

The only difference is that when querying the deployed model, I am not wrapping the test example in np.array() as that is not json-serializeable.

To query the model I tried two different approaches:

import json
import boto3
import pandas as pd

SAGEMAKER_REGION = 'us-west-2'
MODEL_NAME = '...'

client = boto3.client("sagemaker-runtime", region_name=SAGEMAKER_REGION)

test_example = [['This is the subject', 'This is the body']]

# Approach 1
client.invoke_endpoint(
                EndpointName=MODEL_NAME,
                Body=json.dumps(test_example),
                ContentType="application/json",
            )

# Approach 2
client.invoke_endpoint(
                EndpointName=MODEL_NAME,
                Body=pd.DataFrame(test_example).to_json(orient="split"),
                ContentType="application/json; format=pandas-split",
            )

but they result in the same error.

To check if the problem is not in the model itself or in other components, I built a simple workaround.

I encoded strings into numbers (using ord()) and then decoded them back to strings (using chr()) inside the model wrapper. This solved the issue.

Summarizing, the same code worked for integer data, but not for string data.

Code to reproduce issue

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
<ipython-input-89-d09862a5f494> in <module>
      2                 EndpointName=MODEL_NAME,
      3                 Body=test_example,
----> 4                 ContentType="application/json; format=pandas-split",
      5             )

~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    393                     "%s() only accepts keyword arguments." % py_operation_name)
    394             # The "self" in this scope is referring to the BaseClient.
--> 395             return self._make_api_call(operation_name, kwargs)
    396 
    397         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    723             error_code = parsed_response.get("Error", {}).get("Code")
    724             error_class = self.exceptions.from_code(error_code)
--> 725             raise error_class(parsed_response, operation_name)
    726         else:
    727             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{"error_code": "BAD_REQUEST", "message": "dtype of input object does not match expected dtype <U0"}". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/bec-sagemaker-model-test-app in account 543052680787 for more information.

Environment info:

{'channels': ['defaults', 'conda-forge', 'pytorch'],
 'dependencies': ['python=3.6.10',
  'pip==21.3.1',
  'pytorch=1.10.2',
  'cudatoolkit=10.2',
  {'pip': ['mlflow==1.22.0',
    'transformers==4.17.0',
    'datasets==1.18.4',
    'cloudpickle==1.3.0']}],
 'name': 'bert_bec_test_env'}

What component(s), interfaces, languages, and integrations does this bug affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created 2 years ago
Comments:13 (2 by maintainers)

Top GitHub Comments

1reaction

dbczumarcommented, Mar 30, 2022

@arjundc-db Can you try reproducing this using mlflow server?

0reactions

AlxndrMlkcommented, Apr 17, 2022

@tomasatdatabricks,

tried the following format specified in the TF Serving API doc:

test_example = {
    "instances": [['Text 1', 'Text 2']]
}

I used the following request:

client.invoke_endpoint(
                    EndpointName=MODEL_NAME,
                    Body=json.dumps(test_example),
                    ContentType="application/json",
                )

and got the following error:

---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
<ipython-input-18-17212a80d35f> in <module>
      8                     EndpointName=MODEL_NAME,
      9                     Body=json.dumps(test_example),
---> 10                     ContentType="application/json",
     11                 )
     12 

~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    393                     "%s() only accepts keyword arguments." % py_operation_name)
    394             # The "self" in this scope is referring to the BaseClient.
--> 395             return self._make_api_call(operation_name, kwargs)
    396 
    397         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    723             error_code = parsed_response.get("Error", {}).get("Code")
    724             error_class = self.exceptions.from_code(error_code)
--> 725             raise error_class(parsed_response, operation_name)
    726         else:
    727             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{"error_code": "BAD_REQUEST", "message": "Encountered an unexpected error while evaluating the model. Verify that the serialized input Dataframe is compatible with the model for inference.", "stack_trace": "Traceback (most recent call last):\n  File \"/miniconda/envs/custom_env/lib/python3.6/site-packages/mlflow/pyfunc/scoring_server/__init__.py\", line 303, in transformation\n    raw_predictions = model.predict(data)\n  File \"/miniconda/envs/custom_env/lib/python3.6/site-packages/mlflow/pyfunc/__init__.py\", line 608, in predict\n    return self._model_impl.predict(data)\n  File \"/miniconda/envs/custom_env/lib/python3.6/site-packages/mlflow/pyfunc/model.py\", line 300, in predict\n    return self.python_model.predict(self.context, model_input)\n  File \"<ipython-input-10-8d869a2f4386>\", line 60, in predict\n  File \"/miniconda/envs/custom_env/lib/python3.6/site-packages/transformers/tokenization_utils_base.py\", line 2438, in __call__\n    \"text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) \"\nValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).\n"}". See https://eu-central-1.console.aws.amazon.com/...

Top Results From Across the Web

Troubleshooting Amazon SageMaker Model Building Pipelines

Information about common errors and how to resolve them when using Amazon SageMaker Model Building Pipelines.

Getting `dtype of input object does not match expected dtype ...

This workaround worked without issues. ... this might indicate that there is a problem with MLflow's type checking for strings.

ML in Production-1 : Amazon Sagemaker - AWS, setup, train ...

The transformer object like the estimator object needs to know the instance count as well as type along with the format of the...

SageMaker — Boto3 Docs 1.26.32 documentation - AWS

AssociationType (string) -- ... The input configuration object for the model. ... For more information, see Amazon SageMaker Autopilot problem types and ...

Build, Train, & Deploy ML Models Using SageMaker - YouTube

Learn more about Machine Learning at AWS - https://amzn.to/2wvClxC.We met with industry experts to discuss machine learning at our San ...