Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug: concurrent building breaks plugins that rely on before-build / before-build-all

See original GitHub issue

There is an inconsistency between editing a contents file directly and saving the same file admin UI – and also between initial build / build-all and a subsequent file change in admin UI. This inconsistency breaks plugins that depend on before-build or before-build-all.

Saving a file inside admin UI, triggers a build with update_source_info_first enabled:

File "lektor/devserver.py", line 51, in run
    self.build(update_source_info_first=True)

During source info update, the artifact for the saved page (and subartifacts) are built and saved without calling before-build or before-build-all.

Here are the steps to reproduce, the time.sleep is to simulate a lengthy update task:

import time

def on_before_build_all(self, builder, **extra):
    time.sleep(0.5)
    print('start')

def on_before_build(self, builder, build_state, source, prog, **extra):
    time.sleep(0.2)
    print('8', source)

And in lektor.builder.Builder.build(), after emit('before-build') insert print(9, source).

The log for the initial build is in correct order:

Started build
start
8 <Page model='root' path='/'>
9 <Page model='root' path='/'>
U index.html
9 <Directory '/'>
...

However, if you update an existing page:

Started build
9 <Page model='project-entry' path='/projects/barss'>
U projects/barss/index.html
9 <File '/static/style.css'>
9 <File '/static/icons.svg'>
start
8 <Page model='root' path='/'>
9 <Page model='root' path='/'>
9 <Directory '/'>
...

as you can see, the source update triggers a build without emitting a before-build.

I have a plugin that injects a record variable and does some text replacement on the record. However, the before-build-all callback is not executed because of this. Or more precisely, it is sporadically not updated properly because of race conditions. The time.sleep just makes it obvious that there is a problem.

Issue Analytics

State:
Created a year ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

dairikicommented, Mar 31, 2022

I still don’t completely understand what it is you’re trying to do, but you might take a look at the approach used in lektor-index-pages which is the solution I came up with for generating keyword and date indexes for a blog.

Roughly, the indexing (i.e. processing of the children) works by creating virtual source objects to contain all the computed grouping state. When those virtual sources are instantiated, the children are iterated over, classified, and sorted. That state is stored in the virtual source instance. (Which is cached for the lifetime of the pad, so only has to be computed once per build cycle.)

The way this addresses your concerns is:

Process all (recursive) children of a node. In my case clustering them by arbitrary complex grouping functions. Since the result needed is the grouping itself, I cannot process a child node seperately from the parent.

The grouping is computed (essentially) on-demand, by fetching the appropriate virtual source from the Lektor db. E.g. to get a list of all blog keywords along with a count of the number of articles tagged with each one (untested code):

<ul>
{% for subidx in site.get("/blog@index-pages/keyword-idx").subindexes %}
  <li>{{ subidx.children.count() }} articles are tagged with <tt>{{ subidx.key }}</tt></li>
{% endfor %}
</ul>

I want to replace text inside an individual node. This must happen before the build, or otherwise it will not be written to file.

A Record should, I think, be thought of as a database record. It’s a view of what’s in the corresponding contents.lr file. As I said above, mutating a Record doesn’t feel right (unless you’re talking about modifying what’s in the contents.lr as well — but that probably will not be easy to do correctly in the middle of a build cycle).

I think a more appropriate place to integrate more global data (group data) is in the page template(s). If you need to perform some operations which are cumbersome to do in jinja, you may create custom jinja filters or global functions to help.

A parent may need to display the grouping content of its children. Therefore, the content should be build after the children were built.

One can always access the data (fields) of the children. As long as the rest of the building is done by the jinja templates (perhaps using jinja macros or custom filters/functions) build order is not important.

E.g. here’s how to generate a list of all pages which reference each keyword.

<h2>Keywords<h2/>
{% for subidx in site.get("/blog@index-pages/keyword-idx").subindexes %}
  <h3>Keyword: {{ subidx.key }}</h3>
  <ul>
  {% for page in subidx.children %}
    <li>{{ page.title }}</li>
  {% endfor %}
  </ul>
{% endfor %}

If the admin UI saves a file, it first build individually (the currently edited file), and immediately afterwards a build_all is initiated. This is something that could be avoided.

Yes, with minor corrections.

Those two builds are initiated possibly in parallel, rather than in any particular sequence.

The individual source build is triggered not by the file save, but by the subsequent HTTP request for the primary artifact of the edited source. The build_all is not triggered directly by the file save, but is triggered when the file-system monitor notices that any of the project files has been updated.

The reason behind this (I think) is that:

Editing one file potentially requires the whole site to be rebuilt, hence the need for the build_all. Also, files may be edited by means other than the admin web UI, hence the filesystem monitor.
For large sites, the build_all can take a long time to complete, thus the individual build for each requested artifact to ensure that the artifacts served are current.

When we say “build a source object” here, we mean "check all the dependencies (the recorded source files) for the primary artifact of the source object; if the artifact is out-of-date with respect to any of those dependencies then (re)generate the artifact. Unless the artifact is stale (or missing), this is (in theory) a relatively quick process.

Because of the dependency checking, when a page is edited, only one of those two build threads (the build_all and the source-specific build) — whichever one gets to it first (likely the source-specific build) — should[^1] actually regenerate the output artifact.

[^1]: I suspect there are edge cases when both threads will regenerate the artifact.

1reaction

relikdcommented, Mar 31, 2022

god damn it. Even with my mixed build processing it is not consistent 😕 getting out of options…