Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG]Requiring the length of values for each input to be same seems too strict

See original GitHub issue

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
No. I cannot contribute a bug fix at this time.

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 18.04.4
MLflow installed from (source or binary):binary
MLflow version (run mlflow --version):1.19.1
Python version:Python 3.6.10
npm version, if running the dev UI:N/A
Exact command to reproduce:

Describe the problem

Thought this sanity check is too strict. https://github.com/mlflow/mlflow/blob/5784e7e833385e59cd194fd63e4ae5e456abd779/mlflow/utils/proto_json_utils.py#L347

Based on the official documentation of tensorflow RESTful API https://www.tensorflow.org/tfx/serving/api_rest#request_format_2 The value for inputs key can either a single input tensor or a map of input name to tensors (listed in their natural nested form). Each input can have arbitrary shape and need not share the/ same 0-th dimension

For example, if our inputs include two tensors, one tensor with shape (-1, 5) (a batch N of product sequences, each with length 5) and one tensor with shape (-1) (it is a multi-classification problem but we don’t calculate the prob for all classes, instead, we randomly pick K classes and only calculate the probs of these k classes for all sequences in this batch). K and N are not necessarily the same in each call.

It works well in tensorflow serving but doesn’t work in mlflow serving due to the above restriction.

Code to reproduce issue

curl -H 'Content-Type: application/json' 'localhost:5000/invocations' -d '{
"inputs":{
  "input_seq":[[101,275,323,444,512],[289,303,156,223,357]],
  "input_candidates":[100,101,102,104,107,119,124]
  }
}'
{"error_code": "MALFORMED_REQUEST", "message": "Failed to parse data as TF serving input. The length of values for each input/column name are not the same", "stack_trace": "Traceback (most recent call last):\n  File \"/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mlflow/pyfunc/scoring_server/__init__.py\", line 90, in infer_and_parse_json_input\n    return parse_tf_serving_input(decoded_input, schema=schema)\n  File \"/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/mlflow/utils/proto_json_utils.py\", line 244, in parse_tf_serving_input\n    \"Failed to parse data as TF serving input. The length of values for\"\nmlflow.exceptions.MlflowException: Failed to parse data as TF serving input. The length of values for each input/column name are not the same\n"}

Other info / logs

What component(s), interfaces, languages, and integrations does this bug affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created 2 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

tomasatdatabrickscommented, Mar 9, 2022

@dbczumar I think we can juts remove the check cc @arjundc-db who may have more context. I think we should do it to match the tf serving behavior. Thanks for bringing this up @jingnanxue!

1reaction

dbczumarcommented, Mar 9, 2022

@jingnanxue Thank you for raising this. I agree. @tomasatdatabricks can you weigh in here? How difficult would this be to fix?

Top Results From Across the Web

[BUG]Requiring the length of values for each input to be same ...

Open source platform for the machine learning lifecycle - [BUG]Requiring the length of values for each input to be same seems too strict...

How to Fix: Length of values does not match length of index

This tutorial explains how to fix the following error in Python: valueerror: length of values does not match length of index.

How to Fix: Length of values does not match length of index

In this article we will fix the error: The length of values does not match the length of the index in Python.

ValueError: Length of values does not match length of index

The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and...

Finding invalid values in numerical columns | Drawing from Data

In this article we'll look at a simple way to quickly identify invalid values so that we can fix them. As always, to...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[BUG]Requiring the length of values for each input to be same seems too strict

Willingness to contribute

System information

Describe the problem

Code to reproduce issue

Other info / logs

What component(s), interfaces, languages, and integrations does this bug affect?

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[BUG] A problem with type checking for string objects (MLflow-deployed model in SageMaker)

[BUG] Security Vulnerability