Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] Validate request JSON with JSON schema or similar

See original GitHub issue

Thank you for submitting a feature request. Before proceeding, please review MLflow’s Issue Policy for feature requests and the MLflow Contributing Guide.

Please fill in this feature request template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (either as an MLflow Plugin or an enhancement to the MLflow code base)?

Yes. I can contribute this feature independently.
Yes. I would be willing to contribute this feature with guidance from the MLflow community.
No. I cannot contribute this feature at this time.

Proposal Summary

API calls should return HTTP 400 when the parameters (e.g.) don’t match expected data types instead of failing with a 500. Creating a JSON schema – using jsonschema, for example – for the MLFlow REST API to check requests against would fix these issues. This would result in far friendlier UX, easier debugging, more predictable responses, and a generally more RESTful API.

Motivation

I keep getting 500 errors for things like supplying a parameter to an API call that’s the wrong data type. See this issue for an example. This has also happened with calls to logging parameters (both individually and in batches) and all kinds of other functions.

Right now, this means that an end user of a running MLFlow service get an error message like this back when something goes wrong:

Response [https://<<host>>/api/2.0/mlflow/runs/log-batch]
  Date: 2021-12-29 20:15
  Status: 500
  Content-Type: text/html; charset=utf-8
  Size: 290 B
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>

This error was caused by providing a timestamp value to log-batch that was a character string as opposed to a numeric timestamp.

Obviously, this error is unhelpful. There’s no indication of what went wrong or how to fix the issue. More importantly, the 500 is a hint that the client actually did not do anything wrong, and that there was a legitimate issue on the server side. For bad parameters (e.g.), this is obviously not the case, and the client should be seeing an error message with information about the incorrect parameter type and what type was expected, not a cryptic 500 with Unable to complete your request.

The value prop here should be relatively obvious, so I won’t write too much beyond just saying that validating requests against a JSON schema would let users of the MLFlow REST API (in other words, every MLFlow user) more easily and reliably use MLFlow, develop wrappers for the MLFlow API, debug their code when things go wrong, etc. etc. etc.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Languages

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Details

I haven’t written any JSON schema for Python, but in R I know it’s easy to just set up a function to validate requests and then use that function to validate the JSON body of any requests that come in before doing any actual work. If a request fails the JSON validation checks, you can easily return an HTTP 400 -- JSON validation failed with << some error >>.

Let me know if I can help with this improvement! I think it’d be a major step forward for everyone using MLFlow and for the project in general.

Issue Analytics

State:
Created 2 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

dbczumarcommented, Jan 31, 2022

@mrkaye97 That sounds great!

0reactions

dbczumarcommented, Feb 8, 2022

Hi @mrkaye97 , yes, you can use the result of ParseDict for validation. To ensure that validation is applied across the various handlers, I’d recommend adding it to _get_request_message(). You can creating a mapping from each message type to its associated validation function; when _get_request_message() is called, it should resolve the output of ParseDict to the appropriate validation function and invoke it. If ParseDict fails, we should return a 400 (not sure if we’re doing this already). Thank you for taking this on!

Top Results From Across the Web

How to Validate Your JSON Using JSON Schema

JSON Schema is a powerful tool. It enables you to validate your JSON structure and make sure it meets the required API. You...

A Vocabulary for Structural Validation of JSON - JSON Schema

JSON Schema (application/schema+json) has several purposes, one of which is JSON instance validation. This document specifies a vocabulary for JSON Schema ...

Using JSON Schema to Validate Web Service Requests

A better approach to validating JSON input is the use of a JSON schema. Similar to the XML schema, which is written in...

Validating JSON with JSON Schema - Json.NET

The simplest way to check if JSON is valid is to load the JSON into a JObject or JArray and then use the...

Using .NET To Validate JSON with JSON Schema

Before we get started, we'll need an empty console application with the NuGet package of NJsonSchema installed. While other packages are ...