Optimise representation of deeply-nested StreamField blocks in migrations
See original GitHub issueBecause the actual database backend of a StreamField is just a simple LONGTEXT
populated by json-serialized block contents, the database does not need to know about changes made to the Block structure within the StreamField.
However, since Wagtail currently does provide that information to the migrations, this causes complex StreamField instances to generate gargantuan migrations whenever even the tiniest change is made to a single Block used by the field. Here’s an example. This one model’s migration code is nearly 500,000 characters.
And another half megabyte of text will be repeated in each subsequent migration if even a single field in a single Block has a single attribute changed. This is highly undesirable, and as far as I can tell, completely pointless. The auto-generated migration files don’t need to care about the internal structure of a StreamField; only manually written migrations that migrate the data from an old format to a new one need to care, and those don’t need to be half a megabyte of code.
I’m not yet entirely sure how best to remedy this, but I think it will have something to do with StreamField.deconstruct()
.:
def deconstruct(self):
name, path, _, kwargs = super(StreamField, self).deconstruct()
block_types = self.stream_block.child_blocks.items()
args = [block_types]
return name, path, args, kwargs
I don’t think there’s any good reason for it to care about its child blocks. Though while I do have a lot of experience with dealing with, and hacking around, StreamFields, I’m not an expert. Maybe I’m missing something?
Issue Analytics
- State:
- Created 6 years ago
- Comments:19 (14 by maintainers)
This was discussed in the core team meeting on 25/06/20.
In general, it was agreed that including the streamfield block definitions in migrations was the correct default behaviour. This is because at any point in a model’s migration history, you should be able to access both the structured content/data AND the full streamblock definition, in case they are both needed for use in a data migration. As unlikely as this may seem, until there are more established solutions for migrating streamfield content, we feel it’s important to preserve this as an option.
However, we also recognise the issues that many (even core team members themselves) have reported in relation to this, and would like to offer a way for developers to opt out of this behaviour (including streamfield definitions in migrations) completely on a per-project basis, provided they are happy to accept the consequences (as outlined above).
EDIT: If there are other ways to optimise the representation (as outlined above), then they are still well worth exploring, as simpler migrations by default would be an obvious win.
@gasman following up on this:
This problem (large migrations due to highly configurable StreamFields) is something that we’ve run into as well, and is a major pain point around maintenance of our large Wagtail site.
I’m curious if you have any guidance on how or where to draw the line between having many Page models that differ slightly versus having fewer Pages but making them more configurable. Consider a use case where you have a few core page types that share certain common design elements (headers, footers, sidebars), but their main content may differ in numerous subtle ways. Now compare these few pages:
Each of these pages has the same basic structure but the main content area is very different (different images, callouts, links, expandables). In our case we accomplish this through the use of at least one highly configurable StreamField, which can contain various blocks (which may be StructBlocks and hold, e.g. lists of other blocks).
This usage results in similar migration issues to that reported by @coredumperror. We’ve discussed various other ways to set this up, but it’s not clear how best to do this with a “many page types” approach. You’d need something like
PageWithImageAndFormAndLinks
and thenPageWithImageAndLinksButNoForm
- the combinations would quickly get out of hand. Then you’d also have a more difficult editor experience where a page creator would have to choose from a large list of page types that differ in subtle ways. The inability to convert a page from one type to another also makes using page types less attractive than configurable StreamFields.Any insight/suggestion in how best to set up pages like this in “the Wagtail way”?