Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New common merge process

See original GitHub issue

We currently have a single merge process code in OT, the one originally developed for PDF. However, it’s not really optimal for general use and IMO is not particularly good. We should develop a new merging code based on this code and merge PDF to use it.

The current output from merge is

<map>
  <opentopic:map>
    <!-- map contents -->
  </opentopic:map>
  <!-- topics -->
</map>

I suggest the new output is like compound topic, but can also contain a map. Also, the map should be output the the end of the document, because it’s more efficient from processing point of view.

<dita>
  <!-- topics -->
  <map>
    <!-- map contents -->
  <map>
</dita>

Issue Analytics

State:
Created 7 years ago
Comments:10 (8 by maintainers)

Top GitHub Comments

1reaction

ballumscommented, Sep 23, 2016

Radu and Eliot,

I can offer some feedback on creating a single merged map at the beginning of the process. That’s how ePublisher handles DITA map inputs as that closely approximate’s ePublisher expectations for legacy document inputs (Word and FrameMaker).

Since ePublisher has been working with a single merged map for all operations for the past 10 years, I can tell you the approach has its ups and downs. The biggest issue relates to memory. While you can side-step certain problems by running under 64-bit Java, the fact is, if you load a 110MB DITA map into memory, you have a lot of data to work with. XPath operations are simple (you can use xsl:number operations). But, breaking files back out into HTML chunks is complicated as ePublisher flattens DITA hierarchies in its intermediate format. This issue that would not affect DITA-OT if you preserve the DITA map hierarchy in the merged document. For our 2016.1 release, we spent a not insignificant amount of time finding ways to chunk large intermediate files in order to preserve memory for processing. It works very well, yet I wouldn’t wish the work on anyone if they can avoid it.

The main point is that some folks pull in a LOT of data using nested DITA maps. I’d suggest you benchmark the generation of HTML files (roughly 1-1 per DITA topic) against a PDFs (merging all topics) and see how your memory/performance numbers line up. To really stress test it, use 32-bit Java. In the past, when adding support for DITA-OT 1.8 to ePublisher, we encountered memory issues with 32-bit Java. We did add support for 64-bit Java, but also wound up implementing an XSL merge that operates within 32-bit memory limits for large DITA map (100MBs+). I understand there have been many updates to date merge code for later DITA-OT releases. I’m not sure if the memory issue was addressed. Might double-check before going down the single merged file path.

0reactions

stale[bot]commented, Dec 12, 2018

This issue has been automatically marked as stale because it has not been updated recently. It will be closed soon if no further activity occurs. Thank you for your contributions.