[BUG] A problem with type checking for string objects (MLflow-deployed model in SageMaker)
See original GitHub issueThank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.
Please fill in this bug report template to ensure a timely and thorough response.
Willingness to contribute
The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?
- Yes. I can contribute a fix for this bug independently.
- Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
- No. I cannot contribute a bug fix at this time.
System information
- Have I written custom code (as opposed to using a stock example script provided in MLflow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ‘Linux’, ‘4.14.252-131.483.amzn1.x86_64’
- MLflow installed from (source or binary):
pip - MLflow version (run
mlflow --version):mlflow==1.22.0 - Python version:
python=3.6.10 - npm version, if running the dev UI: -
- Exact command to reproduce: -
Describe the problem
I deployed a Huggingface Transformer model in SageMaker using MLflow’s sagemaker.deploy().
The model had been tested after training (using the same test example that was used in the code that led to the described bug).
When logging the model I used infer_signature(np.array(test_example), loaded_model.predict(test_example)) to infer input and output signatures.
Model is deployed successfully. When trying to query the model I get ModelError (full traceback below).
To query the model, I am using precisely the same test_example that I used for infer_signature():
test_example = [['This is the subject', 'This is the body']]
The only difference is that when querying the deployed model, I am not wrapping the test example in np.array() as that is not json-serializeable.
To query the model I tried two different approaches:
import json
import boto3
import pandas as pd
SAGEMAKER_REGION = 'us-west-2'
MODEL_NAME = '...'
client = boto3.client("sagemaker-runtime", region_name=SAGEMAKER_REGION)
test_example = [['This is the subject', 'This is the body']]
# Approach 1
client.invoke_endpoint(
EndpointName=MODEL_NAME,
Body=json.dumps(test_example),
ContentType="application/json",
)
# Approach 2
client.invoke_endpoint(
EndpointName=MODEL_NAME,
Body=pd.DataFrame(test_example).to_json(orient="split"),
ContentType="application/json; format=pandas-split",
)
but they result in the same error.
To check if the problem is not in the model itself or in other components, I built a simple workaround.
I encoded strings into numbers (using ord()) and then decoded them back to strings (using chr()) inside the model wrapper. This solved the issue.
Summarizing, the same code worked for integer data, but not for string data.
Code to reproduce issue
Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
---------------------------------------------------------------------------
ModelError Traceback (most recent call last)
<ipython-input-89-d09862a5f494> in <module>
2 EndpointName=MODEL_NAME,
3 Body=test_example,
----> 4 ContentType="application/json; format=pandas-split",
5 )
~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
393 "%s() only accepts keyword arguments." % py_operation_name)
394 # The "self" in this scope is referring to the BaseClient.
--> 395 return self._make_api_call(operation_name, kwargs)
396
397 _api_call.__name__ = str(py_operation_name)
~/anaconda3/envs/amazonei_tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
723 error_code = parsed_response.get("Error", {}).get("Code")
724 error_class = self.exceptions.from_code(error_code)
--> 725 raise error_class(parsed_response, operation_name)
726 else:
727 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{"error_code": "BAD_REQUEST", "message": "dtype of input object does not match expected dtype <U0"}". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/bec-sagemaker-model-test-app in account 543052680787 for more information.
Environment info:
{'channels': ['defaults', 'conda-forge', 'pytorch'],
'dependencies': ['python=3.6.10',
'pip==21.3.1',
'pytorch=1.10.2',
'cudatoolkit=10.2',
{'pip': ['mlflow==1.22.0',
'transformers==4.17.0',
'datasets==1.18.4',
'cloudpickle==1.3.0']}],
'name': 'bert_bec_test_env'}
What component(s), interfaces, languages, and integrations does this bug affect?
Components
-
area/artifacts: Artifact stores and artifact logging -
area/build: Build and test infrastructure for MLflow -
area/docs: MLflow documentation pages -
area/examples: Example code -
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models: MLmodel format, model serialization/deserialization, flavors -
area/projects: MLproject format, project running backends -
area/scoring: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra: MLflow Tracking server backend -
area/tracking: Tracking Service, tracking client APIs, autologging
Interface
-
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows: Windows support
Language
-
language/r: R APIs and clients -
language/java: Java APIs and clients -
language/new: Proposals for new client languages
Integrations
-
integrations/azure: Azure and Azure ML integrations -
integrations/sagemaker: SageMaker integrations -
integrations/databricks: Databricks integrations
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (2 by maintainers)

Top Related StackOverflow Question
@arjundc-db Can you try reproducing this using
mlflow server?@tomasatdatabricks,
tried the following format specified in the TF Serving API doc:
I used the following request:
and got the following error: