expressive schemas
See original GitHub issueOne of the trickiest parts about using Slate in more complex situations, especially once nested blocks are concerned, is ensuring that your document is in a “normalized” state with respect to your schema.
Since the schema is completely customizable, Slate doesn’t help you by default. You have to add logic to ensure that states you don’t want are normalized away.
A few examples of the kind of things you might want to enforce…
Quotes
quote
blocks always contain block nodes.quote
blocks always contain at least onparagraph
.
Lists
list
blocks contain onlylist_item
blocks.list_item
blocks contain onlyparagraph
blocks.
Images
image
blocks are always wrapped in afigure
block.image
blocks always have adata.src
attribute.figure
blocks contain oneimage
and an optionalfigure_caption
block.figure_caption
blocks only ever appear insidefigure
blocks.
Code
code
blocks contain onlycode_line
blocks.code
blocks always have adata.language
attribute.code_line
blocks contain only text nodes.- Text nodes in
code_line
blocks have no marks.
Links
link
inlines always have adata.href
attribute.
Comments
comment
marks always have adata.author
attribute.
Right now you can do this via schema rules, by defining a set of match, validate, normalize
functions that will correct any invalid states, like so:
{
match(obj) {
return obj.kind == 'block' && obj.type == 'quote',
},
validate(quote) {
const invalidChildren = quote.nodes.filter(n => n.kind != 'block')
if (!invalidChildren.size) return
return invalidChildren
},
normalize(change, quote, invalidChildren) {
invalidChildren.forEach((node) => {
change.removeNodeByKey(node.key)
})
},
}
But while this gives you the maximum flexibility, it’s hard to read and write. It’s hard to tell if you’re leaving certain cases unhandled.
There’s prior art from Prosemirror of having schemas that are more expressive.
I’d like to see if we can find something that Slate core could provide that could make managing schemas easier for people. Ideally it would solve the 90% cases well, and leave the underlying function-based approach for the other 10%.
There are a few hard challenges…
One issue is just of the complexity in allowing for all of the “common” schema-based restrictions. You should be able to… restrict the kind
of child nodes, restrict the number and types of child nodes, restrict marks, restrict data properties, etc.
That said, although it will be more to learn, I do think that 90% of cases can be handled by not too many keywords/syntaxes. So although it’ll take research to figure them out, this should be solveable.
Another problem is that the normalize
function of schema validations can often be context-specific, to be the least destructive possible. For example, in the case of…
figure_caption
blocks only ever appear insidefigure
blocks.
…if you see a rogue figure_caption
at the top-level of the document, you could remove it. But instead, it you converted it to a paragraph
, you’d avoid accidentally deleting content. Like so:
{
match: (object) => {
return object.kind == 'block' && object.type != 'figure'
},
validate: (block) => {
const invalids = block.nodes.filter(n => n.type == 'figure_caption')
return invalids.size ? invalids : null
},
normalize: (change, block, invalids) => {
invalids.forEach(n => change.setNodeByKey(n.key, 'paragraph'))
}
},
But how do you get the expressive schema definitions to understand these kinds of complexities and perform the least destructive normalizations? I’m not exactly sure here. One idea could be exposing the reason
that a schema rule failed, and maybe the normalizations are always left up the user. Or maybe the user can choose to opt-in to more fine-grained normalizing logic if the default doesn’t suit them.
I’d love to hear people’s ideas about what a nice, expressive schema API might look like!
Issue Analytics
- State:
- Created 6 years ago
- Reactions:3
- Comments:8 (5 by maintainers)
Top GitHub Comments
Thinking this through a bit… I can see at least three different approaches to how to define the actual schema rules: (Open to more if you think of some!)
1. Using a Regex-like Syntax
This is the approach Prosemirror takes, which ends up looking something like:
This has the benefit of things being super terse.
But there are a few downsides. One, we have to end up parsing this new, potentially-complex syntax, which seems annoying. Two, in the case of having type names defined as constants, you’d end up having to do lots of string interpolation to insert them into these regex-like strings.
2. Using a Proptypes-like Syntax
This approach would be something more like
prop-types
in React:This has the benefit of the edge cases like “defining exact children order” being a bit more understandable, since you can easily read
exactlyOf
,anyOf
, etc.But this is kind of not nice in that it’s optimizing for these edge cases at the expense of having to add these functions for even the 90% cases that are almost always just restricting child types, kinds and node data. And plugins would then have to import this collection of schema functions to be able to define things, which is more of a pain.
3. Using A Plain Object Syntax
Something that’s more plain and simple maybe:
This has the benefit of being able to even be pure JSON in non-complex cases. (See
normalize
property coming up soon…) And if we do end up making this consumable, editable, etc. then having it in pure JS objects is going to be simplest for consumers.This seems like it might be the best, and maybe is what you all came up with in your discussions too since it seems similar to that snippet above.
I think we could assume some defaults for blocks that would make it easier:
And for inlines:
And then the entire thing would go inside a nested object that splits up the different object kinds for simplicity and so you don’t have to redefine that each time:
There could potentially even be a
document
top-level property since it’s somewhat common to have blocks that shouldn’t even be found at the top-level:And we could later, somehow, maybe even allow marks to be validated and fixed too:
One unknown I have is how should the “merging” happen?
Can we get away with just merging inside
blocks/inlines/marks
and not having to merge individual node definitions?That won’t really work for
document
though… so it seems like we’d need to special case that? Or just require thatdocument
-level schema be added by something that knows the “full picture” and not have it be added to by lower-level plugins. But that kind of defeats the purpose? Maybe not.We might also add a
normalize
property to definitions, which can delegate to the user to define more custom normalization logic. For example:With some sort of sane set of reasons and arguments that are passed.
And then by default it would do the best it could, but often have to result to removing the invalid nodes/marks.
@SamyPesse @Zhouzi @Soreine what do you think? Have you already solved some of these things in your discussions or code?
Also, would you be open to this being called
slate-schema
, but being managed as part of the core monorepo? Just thinking that it would be nice to maintain closely alongside the other core pieces, would allow us to get it more engrained into examples, and would make adoption/standardization better I think.@Soreine If I’m reading it right then yes there is, but why would that be the normalization solution to that validation reason? If quotes can’t contain non-blocks, how would wrapping in a quote help fix the issue?