question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pre-PR discussion: JSON Schema additions (to conform to spec)

See original GitHub issue

Feature Request | Pre-Pull Request discussion

First, thanks for Pydantic. “Just types”. It’s awesome. 🎉 🌮 🍰

Summary

I want to update the schema generation in JSON to implement the JSON Schema spec, which is in the process of becoming an IETF RFC.

But first, I want to discuss what would be the most appropriate way of doing it to keep it aligned with the project’s direction.

Sorry for the long post. I want to be as explicit as possible and demonstrate the motivation clearly.

Current state

I know the great work done at https://github.com/samuelcolvin/pydantic/pull/190 which implements the current schema generation in a JSON format. It was first discussed at https://github.com/samuelcolvin/pydantic/issues/129.

The current implementation generates a schema in JSON format. But is not the same as the one described in the JSON Schema specification.

The changes/additions required are relatively small, as most of the work is already done. But they are spread across the project. But the benefits could be quite big for the ecosystem around Pydantic.

Motivation

First, some motivation for having the “JSON Schema” spec instead of the current schema generation.

There is currently quite some support, tools and “ecosystem” for “JSON Schema” (as defined in the spec).

As a simple example, easy to test, Visual Studio Code has good support for JSON Schema, including auto-completion and type checking.

  • Take the schema from the docs:
{
  "type": "object",
  "title": "Main",
  "description": "This is the description of the main model",
  "properties": {
    "foo_bar": {
      "type": "object",
      "title": "FooBar",
      "properties": {
        "count": {
          "type": "int",
          "title": "Count",
          "required": true
        },
        "size": {
          "type": "float",
          "title": "Size",
          "required": false
        }
      },
      "required": true
    },
    "Gender": {
      "type": "int",
      "title": "Gender",
      "required": false,
      "choices": [
        [1, "Male"],
        [2, "Female"],
        [3, "Other"],
        [4, "I'd rather not say"]
      ]
    },
    "snap": {
      "type": "int",
      "title": "The Snap",
      "required": false,
      "default": 42,
      "description": "this is the value of snap"
    }
  }
}
  • Save it to a file schema.json
  • Open it in VS Code.
  • Add a single extra field (only for this test purposes) declaring the $schema of this JSON document, to define it as a “JSON Schema”, with: "$schema": "http://json-schema.org/draft-07/schema" (VS Code will auto-complete that field schema declaration too). That field, added there only for this test, tells VS Code that this file is itself a JSON Schema (because the JSON Schema spec itself is declared using JSON Schema).
  • Then, just hover over the “error highlighted” sections, the editor shows some of the things that would need to be changed.

screenshot01

  • Then, for example, editing one of the type values that has float, removing it and hitting Ctrl + Space to trigger the auto-complete, shows the types defined by the spec.

screenshot02

  • Now, let’s say that the generated schema was using the JSON Schema spec format, note how the changes are small and simple:
{
    "$schema": "http://json-schema.org/draft-07/schema",
    "type": "object",
    "title": "Main",
    "description": "This is the description of the main model",
    "required": [
      "foo_bar"
    ],
    "properties": {
      "foo_bar": {
        "type": "object",
        "title": "FooBar",
        "properties": {
          "count": {
            "type": "integer",
            "title": "Count"
          },
          "size": {
            "type": "number",
            "title": "Size"
          }
        },
        "required": [
          "count"
        ]
      },
      "Gender": {
        "type": "string",
        "title": "Gender",
        "enum": [
          "Male",
          "Female",
          "Other",
          "I'd rather not say"
        ]
      },
      "snap": {
        "type": "integer",
        "title": "The Snap",
        "default": 42,
        "description": "this is the value of snap"
      }
    }
  }
  • Save that file to ./schema2.json
  • Now create a new JSON file. For this example, to get VS Code completion, create a field of "$schema": "./schema2.json". It defines this JSON file as using the $schema defined above in the previous file:
{
    "$schema": "./schema2.json"
}
  • Now start creating a new JSON field / key, VS Code will provide automatic completion, based on the schema.

screenshot03

  • It has type error detection.

screenshot04

  • And it has completion even for enums.

screenshot05

VS Code is just an example of the current support for JSON Schema (as in the spec) given by tools and the ecosystem. There are many other examples, but this one seemed quite simple and explicit.


Additionally, OpenAPI (previously known as Swagger), which is now part of the Linux foundation, is based on JSON Schema up to some extent.

My ultimate plan is to use Pydantic to generate OpenAPI schemas from Python web APIs. Those schemas can then be used directly with systems like Swagger UI: https://petstore.swagger.io to create interactive and explorable documentation for APIs. And to generate Open API clients in different programing languages, etc.

But all that would come afterwards, as my personal experiments. What I want to add to Pydantic now, which would be the first step for my future plans, is support for JSON Schema (as in the spec).

What I propose

I see different ways to achieve what I want:

A. Modify the current schema related code

This would probably be the most straightforward way. It would minimize code duplication and confusion, as there are already schema_json methods.

It would be a breaking change for anyone that already requires the currently generated schema. So, I guess a version bump would be required.

B. Add additional JSON Schema (spec) code

This wouldn’t disrupt implementations based on the current functionality, wouldn’t required a version bump.

But it would add quite a bunch of code to maintain, as it would have to touch a lot of the methods in several points, so instead of touching them, it would be duplicating all that functionality with little changes.

C. Add some minimal changes to expose data and create an additional module

I could implement it all as a set of functions, without touching the methods from the original classes. But for example, I need to access BaseModel.__fields__, which is a private property. The one I see available is BaseModel.fields, but that’s an instance @property method, not a class method. And all the schema generation is done at the class level, not the instance level.

So, I could add little changes to be able to access the data and process it with additional isolated functions in a separated module, part of the same package.

D. Add additional functions that access private properties

Implement it as functions that use BaseModel.__fields__ and related properties, even though most of them are private properties. It woudn’t be very clean, but wouldn’t touch the current actual code.

E. Expose minimal data and create separate package

Add just the minimum changes to access the data. And create a new package that uses that data and generates JSON Schema from that, but not part of Pydantic.

F. Separate package as a dirty hack

Not change anything inside of Pydantic, just create an external package that accesses private properties in an ugly way…


I already started with some local tests, and already tried starting with some of the possible paths above, but I see that I’ll end up touching a lot of code to implement it in one of those ways, and it might not be the preferred way. So I decided to better just ask first what seems more viable for the project and start the discussion here.

What do you think? How should I proceed?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:9
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
EdgarSuncommented, Apr 13, 2020

thank you @tiangolo

2reactions
samuelcolvincommented, Nov 15, 2018

While generating one schema, I want to also define sub-schemas as top-level schemas with references.

Not sure what you mean by this. Perhaps easiest to submit a PR and we can discuss from there.

I had already started creating a set of functions to do the work…

Yes, moving as much schema logic to a separate module as possible would be good

Is there any reason why you wanted to have Field._schema as a “protected member”…

Because I assume that in general users wouldn’t want to access _schema directly. Much of python (eg. _asdict on named tuples) uses _... as a more of a warning than a hard barrier to external access. I don’t mind changing it though.

Schema(None, minLength=2, maxLength=10)

I’m happy with that as long as constr(min_length=2, max_length=10) continues to work too.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pre-PR discussion: JSON Schema additions (to conform to spec)
Summary I want to update the schema generation in JSON to implement ... Pre-PR discussion: JSON Schema additions (to conform to spec) #291....
Read more >
Specification - JSON Schema
The specification is split into two parts, Core and Validation. ... The release notes discuss the changes impacting users and implementers:.
Read more >
Getting Started Step-By-Step - JSON Schema
In JSON Schema terms, we update our schema to add: The properties validation keyword. The productId key. description schema annotation and type validation ......
Read more >
Towards a stable JSON Schema
About this time last year, I hosted a discussion at the API Specification Conference about the future of JSON Schema.
Read more >
Understanding JSON Schema
JSON Schema is a powerful tool for validating the structure of JSON data. However, learning to use it by reading its specification is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found