question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] Validate request JSON with JSON schema or similar

See original GitHub issue

Thank you for submitting a feature request. Before proceeding, please review MLflow’s Issue Policy for feature requests and the MLflow Contributing Guide.

Please fill in this feature request template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (either as an MLflow Plugin or an enhancement to the MLflow code base)?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the MLflow community.
  • No. I cannot contribute this feature at this time.

Proposal Summary

API calls should return HTTP 400 when the parameters (e.g.) don’t match expected data types instead of failing with a 500. Creating a JSON schema – using jsonschema, for example – for the MLFlow REST API to check requests against would fix these issues. This would result in far friendlier UX, easier debugging, more predictable responses, and a generally more RESTful API.

Motivation

I keep getting 500 errors for things like supplying a parameter to an API call that’s the wrong data type. See this issue for an example. This has also happened with calls to logging parameters (both individually and in batches) and all kinds of other functions.

Right now, this means that an end user of a running MLFlow service get an error message like this back when something goes wrong:

Response [https://<<host>>/api/2.0/mlflow/runs/log-batch]
  Date: 2021-12-29 20:15
  Status: 500
  Content-Type: text/html; charset=utf-8
  Size: 290 B
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>

This error was caused by providing a timestamp value to log-batch that was a character string as opposed to a numeric timestamp.

Obviously, this error is unhelpful. There’s no indication of what went wrong or how to fix the issue. More importantly, the 500 is a hint that the client actually did not do anything wrong, and that there was a legitimate issue on the server side. For bad parameters (e.g.), this is obviously not the case, and the client should be seeing an error message with information about the incorrect parameter type and what type was expected, not a cryptic 500 with Unable to complete your request.

The value prop here should be relatively obvious, so I won’t write too much beyond just saying that validating requests against a JSON schema would let users of the MLFlow REST API (in other words, every MLFlow user) more easily and reliably use MLFlow, develop wrappers for the MLFlow API, debug their code when things go wrong, etc. etc. etc.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Languages

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Details

I haven’t written any JSON schema for Python, but in R I know it’s easy to just set up a function to validate requests and then use that function to validate the JSON body of any requests that come in before doing any actual work. If a request fails the JSON validation checks, you can easily return an HTTP 400 -- JSON validation failed with << some error >>.

Let me know if I can help with this improvement! I think it’d be a major step forward for everyone using MLFlow and for the project in general.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
dbczumarcommented, Jan 31, 2022

@mrkaye97 That sounds great!

0reactions
dbczumarcommented, Feb 8, 2022

Hi @mrkaye97 , yes, you can use the result of ParseDict for validation. To ensure that validation is applied across the various handlers, I’d recommend adding it to _get_request_message(). You can creating a mapping from each message type to its associated validation function; when _get_request_message() is called, it should resolve the output of ParseDict to the appropriate validation function and invoke it. If ParseDict fails, we should return a 400 (not sure if we’re doing this already). Thank you for taking this on!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Validate Your JSON Using JSON Schema
JSON Schema is a powerful tool. It enables you to validate your JSON structure and make sure it meets the required API. You...
Read more >
A Vocabulary for Structural Validation of JSON - JSON Schema
JSON Schema (application/schema+json) has several purposes, one of which is JSON instance validation. This document specifies a vocabulary for JSON Schema ...
Read more >
Using JSON Schema to Validate Web Service Requests
A better approach to validating JSON input is the use of a JSON schema. Similar to the XML schema, which is written in...
Read more >
Validating JSON with JSON Schema - Json.NET
The simplest way to check if JSON is valid is to load the JSON into a JObject or JArray and then use the...
Read more >
Using .NET To Validate JSON with JSON Schema
Before we get started, we'll need an empty console application with the NuGet package of NJsonSchema installed. While other packages are ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found