Pre-PR discussion: JSON Schema additions (to conform to spec)
See original GitHub issueFeature Request | Pre-Pull Request discussion
First, thanks for Pydantic. “Just types”. It’s awesome. 🎉 🌮 🍰
Summary
I want to update the schema generation in JSON to implement the JSON Schema spec, which is in the process of becoming an IETF RFC.
But first, I want to discuss what would be the most appropriate way of doing it to keep it aligned with the project’s direction.
Sorry for the long post. I want to be as explicit as possible and demonstrate the motivation clearly.
Current state
I know the great work done at https://github.com/samuelcolvin/pydantic/pull/190 which implements the current schema generation in a JSON format. It was first discussed at https://github.com/samuelcolvin/pydantic/issues/129.
The current implementation generates a schema in JSON format. But is not the same as the one described in the JSON Schema specification.
The changes/additions required are relatively small, as most of the work is already done. But they are spread across the project. But the benefits could be quite big for the ecosystem around Pydantic.
Motivation
First, some motivation for having the “JSON Schema” spec instead of the current schema generation.
There is currently quite some support, tools and “ecosystem” for “JSON Schema” (as defined in the spec).
As a simple example, easy to test, Visual Studio Code has good support for JSON Schema, including auto-completion and type checking.
- Take the schema from the docs:
{
"type": "object",
"title": "Main",
"description": "This is the description of the main model",
"properties": {
"foo_bar": {
"type": "object",
"title": "FooBar",
"properties": {
"count": {
"type": "int",
"title": "Count",
"required": true
},
"size": {
"type": "float",
"title": "Size",
"required": false
}
},
"required": true
},
"Gender": {
"type": "int",
"title": "Gender",
"required": false,
"choices": [
[1, "Male"],
[2, "Female"],
[3, "Other"],
[4, "I'd rather not say"]
]
},
"snap": {
"type": "int",
"title": "The Snap",
"required": false,
"default": 42,
"description": "this is the value of snap"
}
}
}
- Save it to a file
schema.json
- Open it in VS Code.
- Add a single extra field (only for this test purposes) declaring the
$schema
of this JSON document, to define it as a “JSON Schema”, with:"$schema": "http://json-schema.org/draft-07/schema"
(VS Code will auto-complete that field schema declaration too). That field, added there only for this test, tells VS Code that this file is itself a JSON Schema (because the JSON Schema spec itself is declared using JSON Schema). - Then, just hover over the “error highlighted” sections, the editor shows some of the things that would need to be changed.
- Then, for example, editing one of the
type
values that hasfloat
, removing it and hittingCtrl + Space
to trigger the auto-complete, shows the types defined by the spec.
- Now, let’s say that the generated schema was using the JSON Schema spec format, note how the changes are small and simple:
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"title": "Main",
"description": "This is the description of the main model",
"required": [
"foo_bar"
],
"properties": {
"foo_bar": {
"type": "object",
"title": "FooBar",
"properties": {
"count": {
"type": "integer",
"title": "Count"
},
"size": {
"type": "number",
"title": "Size"
}
},
"required": [
"count"
]
},
"Gender": {
"type": "string",
"title": "Gender",
"enum": [
"Male",
"Female",
"Other",
"I'd rather not say"
]
},
"snap": {
"type": "integer",
"title": "The Snap",
"default": 42,
"description": "this is the value of snap"
}
}
}
- Save that file to
./schema2.json
- Now create a new JSON file. For this example, to get VS Code completion, create a field of
"$schema": "./schema2.json"
. It defines this JSON file as using the$schema
defined above in the previous file:
{
"$schema": "./schema2.json"
}
- Now start creating a new JSON field / key, VS Code will provide automatic completion, based on the schema.
- It has type error detection.
- And it has completion even for enums.
VS Code is just an example of the current support for JSON Schema (as in the spec) given by tools and the ecosystem. There are many other examples, but this one seemed quite simple and explicit.
Additionally, OpenAPI (previously known as Swagger), which is now part of the Linux foundation, is based on JSON Schema up to some extent.
My ultimate plan is to use Pydantic to generate OpenAPI schemas from Python web APIs. Those schemas can then be used directly with systems like Swagger UI: https://petstore.swagger.io to create interactive and explorable documentation for APIs. And to generate Open API clients in different programing languages, etc.
But all that would come afterwards, as my personal experiments. What I want to add to Pydantic now, which would be the first step for my future plans, is support for JSON Schema (as in the spec).
What I propose
I see different ways to achieve what I want:
A. Modify the current schema related code
This would probably be the most straightforward way. It would minimize code duplication and confusion, as there are already schema_json
methods.
It would be a breaking change for anyone that already requires the currently generated schema. So, I guess a version bump would be required.
B. Add additional JSON Schema (spec) code
This wouldn’t disrupt implementations based on the current functionality, wouldn’t required a version bump.
But it would add quite a bunch of code to maintain, as it would have to touch a lot of the methods in several points, so instead of touching them, it would be duplicating all that functionality with little changes.
C. Add some minimal changes to expose data and create an additional module
I could implement it all as a set of functions, without touching the methods from the original classes. But for example, I need to access BaseModel.__fields__
, which is a private property. The one I see available is BaseModel.fields
, but that’s an instance @property
method, not a class method. And all the schema generation is done at the class level, not the instance level.
So, I could add little changes to be able to access the data and process it with additional isolated functions in a separated module, part of the same package.
D. Add additional functions that access private properties
Implement it as functions that use BaseModel.__fields__
and related properties, even though most of them are private properties. It woudn’t be very clean, but wouldn’t touch the current actual code.
E. Expose minimal data and create separate package
Add just the minimum changes to access the data. And create a new package that uses that data and generates JSON Schema from that, but not part of Pydantic.
F. Separate package as a dirty hack
Not change anything inside of Pydantic, just create an external package that accesses private properties in an ugly way…
I already started with some local tests, and already tried starting with some of the possible paths above, but I see that I’ll end up touching a lot of code to implement it in one of those ways, and it might not be the preferred way. So I decided to better just ask first what seems more viable for the project and start the discussion here.
What do you think? How should I proceed?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:9
- Comments:8 (7 by maintainers)
thank you @tiangolo
Not sure what you mean by this. Perhaps easiest to submit a PR and we can discuss from there.
Yes, moving as much schema logic to a separate module as possible would be good
Because I assume that in general users wouldn’t want to access
_schema
directly. Much of python (eg._asdict
on named tuples) uses_...
as a more of a warning than a hard barrier to external access. I don’t mind changing it though.I’m happy with that as long as
constr(min_length=2, max_length=10)
continues to work too.