Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consider providing w14:paraId and w14:textId generators

See original GitHub issue

Description

w:p and w:tr elements can have the w14:paraId and w14:textId attributes, which are defined in MS-DOCX as ST_LongHexNumber values that are unique within the document part as well as greater than 0 and less than 0x80000000.

Microsoft Word uses a random number generator to generate the values (noting that is not a requirement).

At the moment, the Paragraph (w:p) and TableRow (w:tr) classes do not generate values for the ParagraphId (w14:paraId) and TextId (w14:textId) attributes. There are also no utility classes or methods for producing compliant values.

Therefore, the question is whether we want to offer any functionality for generating or validating those attribute values.

Providing utility methods would be very straightforward. For example, here is the code (taken from two classes) that I am using in my codebase for creating random ST_LongHexNumber values (optionally making sure they are less than 0x80000000 while always guaranteeing they are greater than 0):

private static readonly RNGCryptoServiceProvider Generator = new RNGCryptoServiceProvider();

/// <summary>
/// Creates an ST_LongHexNumber value, masking the most significant byte with
/// the given <paramref name="msbMask" />.
/// </summary>
/// <param name="msbMask">The most significant byte mask.</param>
public static string CreateRandomLongHexNumber(byte msbMask = 0xff)
{
    // Create a four-byte random number, noting that the first byte (data[0])
    // will become the most significant byte in the string value created by
    // the ToHexString() method.
    var data = new byte[4];
    Generator.GetNonZeroBytes(data);
    data[0] &= msbMask;

    return data.ToHexString();
}

/// <summary>
/// Converts the given value into a hexadecimal string, with the first
/// byte in the list being the most significant byte in the resulting
/// string.
/// </summary>
/// <param name="source">The list of bytes to be converted.</param>
/// <returns>A hexadecimal string.</returns>
public static string ToHexString(this IReadOnlyList<byte> source)
{
    var dest = new char[source.Count * 2];

    var i = 0;
    var j = 0;

    while (i < source.Count)
    {
        byte b = source[i++];
        dest[j++] = ToCharUpper(b >> 4);
        dest[j++] = ToCharUpper(b);
    }

    return new string(dest);
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static char ToCharUpper(int value)
{
    value &= 0xF;
    value += '0';

    if (value > '9')
    {
        value += ('A' - ('9' + 1));
    }

    return (char) value;
}

A first step could be to provide utility or extension methods without changing the Paragraph and TableRow classes.

A second, optional step could be to enhance the Paragraph and TableRow classes by adding instance methods like the following (noting that I have not put much thought into this yet and using the Paragraph class as an example):

// Normal setter methods.
public void SetRandomParagraphId();
public void SetRandomTextId();

// Methods that would be handy for pure functional transformation scenarios.
// Like the With() method we added earlier.
public Paragraph WithNewRandomParagraphId();
public Paragraph WithNewRandomTextId();

Information

.NET Target: all
DocumentFormat.OpenXml Version: latest

Issue Analytics

State:
Created 2 years ago
Comments:17 (5 by maintainers)

Top GitHub Comments

1reaction

tomjebocommented, Jul 21, 2021

The one thing right now that makes me hesitate is conflicts. Consumers of the SDK can emit paraId/TextId that are not necessarily random as Word does but as long as they conform to the boundaries/rules documented including that they should not conflict with other paraId (document-wide uniqueness). Generating id’s is quick as @ThomasBarnekow and @rmboggs showed, however, checking for conflict may not be. The main problem would be the time/processing during construction to check a whole document for conflicts. for large documents, that could be undesirable. And although Word doesn’t fail opening these, 1) they will likely be replaced on save and 2) we don’t know if there will be negative side effects before they are replaced. So perhaps if we can show that adding unique and non-conflicting paraId’s can be done in a performant way, I would be more likely to agree. Having said that, Word does add these while checking for conflicts, but it’s a large application working in memory with binary representations of lots of collections which is likely more efficient.

0reactions

twsouthwickcommented, Aug 5, 2022

There’s a different package: DocumentFormat.OpenXml.Features that has this and other helpful things

Top Results From Across the Web

Consider providing w14:paraId and w14:textId generators

Microsoft Word uses a random number generator to generate the values (noting that is not a requirement). At the moment, the Paragraph (...

Nested repeating tables in word template

Good afternoon all, I am looking for a way to present an array of arrays in a printable, human readable way. For example,...

Best way to generate Microsoft Word docx from ABAP

There are several ways to generate Microsoft Word docx documents using ABAP. All of them have a number of disadvantages:.

Corrupted file when creating Word document - java

I ran this exact code, using the current stable release of docx4j (v3.1) with no issues. A document was created and opened just...

Reading and writing Microsoft Word docx files with Python

In this post, I'll describe the structure of this file format and how to access it easily in python. I've also used these...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Consider providing w14:paraId and w14:textId generators

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

PPT: The Outline default Line Width value is one pixel not the zero

The error is occurring while merging the two OpenXML word documents having the chart into one of the document.