question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

expressive schemas

See original GitHub issue

One of the trickiest parts about using Slate in more complex situations, especially once nested blocks are concerned, is ensuring that your document is in a “normalized” state with respect to your schema.

Since the schema is completely customizable, Slate doesn’t help you by default. You have to add logic to ensure that states you don’t want are normalized away.

A few examples of the kind of things you might want to enforce…

Quotes

  • quote blocks always contain block nodes.
  • quote blocks always contain at least on paragraph.

Lists

  • list blocks contain only list_item blocks.
  • list_item blocks contain only paragraph blocks.

Images

  • image blocks are always wrapped in a figure block.
  • image blocks always have a data.src attribute.
  • figure blocks contain one image and an optional figure_caption block.
  • figure_caption blocks only ever appear inside figure blocks.

Code

  • code blocks contain only code_line blocks.
  • code blocks always have a data.language attribute.
  • code_line blocks contain only text nodes.
  • Text nodes in code_line blocks have no marks.

Links

  • link inlines always have a data.href attribute.

Comments

  • comment marks always have a data.author attribute.

Right now you can do this via schema rules, by defining a set of match, validate, normalize functions that will correct any invalid states, like so:

{
  match(obj) {
    return obj.kind == 'block' && obj.type == 'quote',
  },
  validate(quote) {
    const invalidChildren = quote.nodes.filter(n => n.kind != 'block')
    if (!invalidChildren.size) return
    return invalidChildren
  },
  normalize(change, quote, invalidChildren) {
    invalidChildren.forEach((node) => {
      change.removeNodeByKey(node.key)
    })
  },
}

But while this gives you the maximum flexibility, it’s hard to read and write. It’s hard to tell if you’re leaving certain cases unhandled.

There’s prior art from Prosemirror of having schemas that are more expressive.

I’d like to see if we can find something that Slate core could provide that could make managing schemas easier for people. Ideally it would solve the 90% cases well, and leave the underlying function-based approach for the other 10%.


There are a few hard challenges…

One issue is just of the complexity in allowing for all of the “common” schema-based restrictions. You should be able to… restrict the kind of child nodes, restrict the number and types of child nodes, restrict marks, restrict data properties, etc.

That said, although it will be more to learn, I do think that 90% of cases can be handled by not too many keywords/syntaxes. So although it’ll take research to figure them out, this should be solveable.

Another problem is that the normalize function of schema validations can often be context-specific, to be the least destructive possible. For example, in the case of…

figure_caption blocks only ever appear inside figure blocks.

…if you see a rogue figure_caption at the top-level of the document, you could remove it. But instead, it you converted it to a paragraph, you’d avoid accidentally deleting content. Like so:

{
  match: (object) => {
    return object.kind == 'block' && object.type != 'figure'
  },
  validate: (block) => {
    const invalids = block.nodes.filter(n => n.type == 'figure_caption')
      return invalids.size ? invalids : null
    },
  normalize: (change, block, invalids) => {
    invalids.forEach(n => change.setNodeByKey(n.key, 'paragraph'))
  }
},

But how do you get the expressive schema definitions to understand these kinds of complexities and perform the least destructive normalizations? I’m not exactly sure here. One idea could be exposing the reason that a schema rule failed, and maybe the normalizations are always left up the user. Or maybe the user can choose to opt-in to more fine-grained normalizing logic if the default doesn’t suit them.


I’d love to hear people’s ideas about what a nice, expressive schema API might look like!

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:3
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
ianstormtaylorcommented, Oct 19, 2017

Thinking this through a bit… I can see at least three different approaches to how to define the actual schema rules: (Open to more if you think of some!)

1. Using a Regex-like Syntax

This is the approach Prosemirror takes, which ends up looking something like:

{
  paragraph: {},
  figure_caption: {},
  quote: {
    kinds: 'block+',
  },
  figure: {
    nodes: '(image|video),figure_caption?',
  },
  heading: {
    data: {
      level: isHeadingLevel,
    },
  },
  image: {
    isVoid: true,
    data: {
      src: isUrl,
    },
  },
  video: {
    isVoid: true,
    data: {
      src: isUrl,
    },
  },
  code: {
    nodes: 'code_line+',
    data: {
      language: isCodeLanguage,
    },
  },
  code_line: {
    kinds: 'text',
    marks: '',
  },
}

This has the benefit of things being super terse.

But there are a few downsides. One, we have to end up parsing this new, potentially-complex syntax, which seems annoying. Two, in the case of having type names defined as constants, you’d end up having to do lots of string interpolation to insert them into these regex-like strings.

2. Using a Proptypes-like Syntax

This approach would be something more like prop-types in React:

{
  paragraph: {},
  figure_caption: {},
  quote: {
    nodes: Schema.kindsOf(['block']),
  },
  figure: {
    nodes: Schema.exactly([
      Schema.typesOf(['image', 'embed']), 
      Schema.oneOrNoneOf({ type: 'figure_caption' }, 1),
    ])
  },
  heading: {
    data: {
      level: isHeadingLevel,
    },
  },
  image: {
    isVoid: true,
    data: {
      src: isUrl,
    },
  },
  code: {
    nodes: Schema.typesOf(['code_line']),
    data: {
      language: isCodeLanguage,
    },
  },
  code_line: {
    marks: [],
    nodes: Schema.kindsOf(['text']),
  },
}

This has the benefit of the edge cases like “defining exact children order” being a bit more understandable, since you can easily read exactlyOf, anyOf, etc.

But this is kind of not nice in that it’s optimizing for these edge cases at the expense of having to add these functions for even the 90% cases that are almost always just restricting child types, kinds and node data. And plugins would then have to import this collection of schema functions to be able to define things, which is more of a pain.

3. Using A Plain Object Syntax

Something that’s more plain and simple maybe:

{
  paragraph: {},
  figure_caption: {},
  quote: {
    nodes: [
      { kind: 'block' },
    ]
  },
  figure: {
    nodes: [
      { type: ['image', 'embed'], min: 1, max: 1 }, 
      { type: 'figure_caption', min: 0 },
    ]
  },
  heading: {
    data: {
      level: isHeadingLevel,
    },
  },
  image: {
    isVoid: true,
    data: {
      src: isUrl,
    },
  },
  code: {
    nodes: [
      'code_line'
    ],
    data: {
      language: isCodeLanguage,
    },
  },
  code_line: {
    marks: [],
    nodes: [
      { kind: 'text' }
    ]
  },
}

This has the benefit of being able to even be pure JSON in non-complex cases. (See normalize property coming up soon…) And if we do end up making this consumable, editable, etc. then having it in pure JS objects is going to be simplest for consumers.

This seems like it might be the best, and maybe is what you all came up with in your discussions too since it seems similar to that snippet above.


I think we could assume some defaults for blocks that would make it easier:

{
  marks: undefined,
  data: undefined,
  isVoid: false,
  nodes: [
    { kind: ['text', 'inline'], min: 1, max: Infinity }
  ]
}

And for inlines:

{
  marks: undefined,
  data: undefined,
  isVoid: false,
  nodes: [
    { kind: ['text'], min: 1, max: 1 }
  ]
}

And then the entire thing would go inside a nested object that splits up the different object kinds for simplicity and so you don’t have to redefine that each time:

{
  blocks: { ... },
  inlines: { ... },
  marks: { ... },
}

There could potentially even be a document top-level property since it’s somewhat common to have blocks that shouldn’t even be found at the top-level:

{
  document: {
    nodes: [
      'paragraph',
      'quote',
      ... // but not 'figure_caption' or 'code_line'
    ]
  }
}

And we could later, somehow, maybe even allow marks to be validated and fixed too:

{
  marks: {
    bold: {
      data: { ... },
      marks: [ ... ], // marks that it is allowed to coexist with
    }
  }
}

One unknown I have is how should the “merging” happen?

Can we get away with just merging inside blocks/inlines/marks and not having to merge individual node definitions?

That won’t really work for document though… so it seems like we’d need to special case that? Or just require that document-level schema be added by something that knows the “full picture” and not have it be added to by lower-level plugins. But that kind of defeats the purpose? Maybe not.


We might also add a normalize property to definitions, which can delegate to the user to define more custom normalization logic. For example:

quote: {
  nodes: [
    { kind: 'block' },
  ],
  normalize: (change, reason, node, child) => {
    if (reason == 'child_kind_invalid') {
      change.wrapBlockByKey(child.key, 'quote')
    }
  }
}

With some sort of sane set of reasons and arguments that are passed.

And then by default it would do the best it could, but often have to result to removing the invalid nodes/marks.


@SamyPesse @Zhouzi @Soreine what do you think? Have you already solved some of these things in your discussions or code?

Also, would you be open to this being called slate-schema, but being managed as part of the core monorepo? Just thinking that it would be nice to maintain closely alongside the other core pieces, would allow us to get it more engrained into examples, and would make adoption/standardization better I think.

0reactions
ianstormtaylorcommented, Oct 23, 2017

@Soreine If I’m reading it right then yes there is, but why would that be the normalization solution to that validation reason? If quotes can’t contain non-blocks, how would wrapping in a quote help fix the issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Federated schema design best practices - Apollo GraphQL Docs
Best practice #3: Prioritize schema expressiveness. A good GraphQL schema will convey meaning about the underlying nodes in a graph, as well as...
Read more >
Making Object-Oriented Schemas More Expressive
Abstract. Current object-oriented data models lack several impor- tant features that would allow one to express relevant knowledge about the classes of a ......
Read more >
Expressive and Reserved Cultural Linguistic Schemas: British ...
This chapter focuses on a comparison between cultural schemas and models (Sharifian, 2017) and Emotion Event scenarios ...
Read more >
Making object-oriented schemas more expressive
In this paper we define a new data model, called Cdl?, which extends the basic core of current object-oriented data models with all...
Read more >
SCHEMA: Automatic Expressive Opinion Generation - YouTube
SCHEMA (pronounce as "she:ma") can speak representative opinions eloquently described, which are associated with current topics.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found