Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

In DITA-OT project files, allow subsets of deliverables to be published

See original GitHub issue

Description

Currently, DITA-OT project file publishing is limited to publishing one deliverable or all deliverables.

To publish subsets of deliverables, the <deliverable> definitions must be split into multiple files, with subsets built using a hierarchy of <include> directives:

files

This has the following drawbacks:

Many files can be required.
- We have 100+ contexts, shared by dozens of products, organized into several help collections.
- Our deliverable definitions are resistant to grouping in fewer files, due to being shared where products overlap.
It is difficult to know what a particular file publishes without tracing through multiple files.
It is difficult to explore the evolution of project file content over time in a revision control system.
Maintenance operations (search-and-replace, adjusting collections) are more complex.
<context> and <publication> information must be replicated in every deliverable file to allow individual publishing (inelegant, but technically harmless).

Possible Solution

It would be useful to be able to natively define collections of deliverables (and collections of collections), perhaps like this:

collections

where a “collection” represents its subset of referenced deliverables:

dita --project project.xml --deliverable product3

When a collection is published, its deliverables should be published in the order resulting from expanding the references. This provides support for order-dependent deliverables (such as HTML online help that cleans its output directory first, followed by PDF deliverables written to that same output directory).

If the reference expansion includes a deliverable multiple times, the deliverable should only be published once (the first occurrence seems reasonable).

Detection for circular collection reference loops would be needed.

Potential Alternatives

I considered having a deliverable declare its dependencies via a new <depends> element, but this made it difficult to specify whether to publish a deliverable alone or with its dependencies, plus there was no intuitive way to describe order dependency.

Additional Context

A testcase is included:

ditaot_project_file_collections.zip

The project.xml file in the testcase uses the format proposed above, although other implementations are possible.

Issue Analytics

State:
Created a year ago
Comments:14 (14 by maintainers)

Top GitHub Comments

1reaction

xephon2commented, Apr 13, 2022

@chrispy-snps , not much to add. This is exactly how it should work.

0reactions

chrispy-snpscommented, Jun 3, 2022

@jelovirt - I am normally not a fan of hyphens in element names, but I have come to prefer <deliverable-set> because

It makes more sense with dita --deliverable.
It keeps terminology focused on the “big three” - contexts, publications, deliverables - without introducing a fourth term.
Its purpose and relationship to the “big three” are more intuitively clear without referring back to documentation.

So regardless of the ordering connotation that comes with <deliverable-set>, I think it’s more intuitive for users.

An ordering guarantee for deliverable generation is a nice-to-have, but not a requirement.

For us, the documentation for a product family is (1) one Oxygen WebHelp deliverable that contains all the books as submaps, plus (2) individual book PDFs placed into that same output directory:

html5_and_pdf

Because WebHelp is HTML-based, we must clean the output directory before publishing it to avoid orphaned files (see #1199), but that must be done before the PDF deliverables are also written into that directory.

If there were an ordering guarantee “built into” deliverable sets, then we could invoke output-cleaning plugins in a certain way. If not, then output-cleaning must be moved to a preprocessing operation that computes and cleans all output directories for the deliverables to be published. This is easy in automated linux publishing where I can write a wrapper script to do that, but not easy in Oxygen where writers interactively publish from the software’s UI. They will need to remember to manually delete their output directory from time to time.

And some day, perhaps if the DITA-OT implements a “parallel deliverable publishing” capability (wouldn’t that be cool!), then the ordering guarantee must be discarded because a complete linear sequencing of deliverables is inherently incompatible with parallelization.

So if an ordering guarantee is implemented, I think value could be obtained from it. But it’s just a nice-to-have, not a requirement.

@xephon2 - do you have any thoughts on order of deliverable generation in a deliverable set?

Top Results From Across the Web

In DITA-OT project files, allow subsets of deliverables to be published

Hi everyone, We are starting to use DITA-OT project files: DITA-OT documentation - Publishing with project files. One feature we would find extremely...

In DITA-OT project files, allow subsets of deliverables to be ...

Hi everyone, I initially posted this to the DITA-OT Users group, but I am cross-posting here for folks who aren't on that list....

Preprocessing DITA-OT Project Files - Blog

This @idref mechanism allows many deliverables to share common context and publication definitions. If there is a change to a <context> (perhaps ...

Publishing with project files - DITA Open Toolkit

DITA-OT 3.4 introduces new project files to define publication projects with ... allowing you to define multiple deliverables with separate input files and ......

DITA Open Toolkit User Guide - SourceForge

the DITA source files are available at http://dita-ot.sourceforge.net ... If you plan to produce printed deliverables, tools that provide FO ...