question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consider providing w14:paraId and w14:textId generators

See original GitHub issue

Description

w:p and w:tr elements can have the w14:paraId and w14:textId attributes, which are defined in MS-DOCX as ST_LongHexNumber values that are unique within the document part as well as greater than 0 and less than 0x80000000.

Microsoft Word uses a random number generator to generate the values (noting that is not a requirement).

At the moment, the Paragraph (w:p) and TableRow (w:tr) classes do not generate values for the ParagraphId (w14:paraId) and TextId (w14:textId) attributes. There are also no utility classes or methods for producing compliant values.

Therefore, the question is whether we want to offer any functionality for generating or validating those attribute values.

Providing utility methods would be very straightforward. For example, here is the code (taken from two classes) that I am using in my codebase for creating random ST_LongHexNumber values (optionally making sure they are less than 0x80000000 while always guaranteeing they are greater than 0):

private static readonly RNGCryptoServiceProvider Generator = new RNGCryptoServiceProvider();

/// <summary>
/// Creates an ST_LongHexNumber value, masking the most significant byte with
/// the given <paramref name="msbMask" />.
/// </summary>
/// <param name="msbMask">The most significant byte mask.</param>
public static string CreateRandomLongHexNumber(byte msbMask = 0xff)
{
    // Create a four-byte random number, noting that the first byte (data[0])
    // will become the most significant byte in the string value created by
    // the ToHexString() method.
    var data = new byte[4];
    Generator.GetNonZeroBytes(data);
    data[0] &= msbMask;

    return data.ToHexString();
}

/// <summary>
/// Converts the given value into a hexadecimal string, with the first
/// byte in the list being the most significant byte in the resulting
/// string.
/// </summary>
/// <param name="source">The list of bytes to be converted.</param>
/// <returns>A hexadecimal string.</returns>
public static string ToHexString(this IReadOnlyList<byte> source)
{
    var dest = new char[source.Count * 2];

    var i = 0;
    var j = 0;

    while (i < source.Count)
    {
        byte b = source[i++];
        dest[j++] = ToCharUpper(b >> 4);
        dest[j++] = ToCharUpper(b);
    }

    return new string(dest);
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static char ToCharUpper(int value)
{
    value &= 0xF;
    value += '0';

    if (value > '9')
    {
        value += ('A' - ('9' + 1));
    }

    return (char) value;
}

A first step could be to provide utility or extension methods without changing the Paragraph and TableRow classes.

A second, optional step could be to enhance the Paragraph and TableRow classes by adding instance methods like the following (noting that I have not put much thought into this yet and using the Paragraph class as an example):

// Normal setter methods.
public void SetRandomParagraphId();
public void SetRandomTextId();

// Methods that would be handy for pure functional transformation scenarios.
// Like the With() method we added earlier.
public Paragraph WithNewRandomParagraphId();
public Paragraph WithNewRandomTextId();

Information

  • .NET Target: all
  • DocumentFormat.OpenXml Version: latest

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:17 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
tomjebocommented, Jul 21, 2021

The one thing right now that makes me hesitate is conflicts. Consumers of the SDK can emit paraId/TextId that are not necessarily random as Word does but as long as they conform to the boundaries/rules documented including that they should not conflict with other paraId (document-wide uniqueness). Generating id’s is quick as @ThomasBarnekow and @rmboggs showed, however, checking for conflict may not be. The main problem would be the time/processing during construction to check a whole document for conflicts. for large documents, that could be undesirable. And although Word doesn’t fail opening these, 1) they will likely be replaced on save and 2) we don’t know if there will be negative side effects before they are replaced. So perhaps if we can show that adding unique and non-conflicting paraId’s can be done in a performant way, I would be more likely to agree. Having said that, Word does add these while checking for conflicts, but it’s a large application working in memory with binary representations of lots of collections which is likely more efficient.

0reactions
twsouthwickcommented, Aug 5, 2022

There’s a different package: DocumentFormat.OpenXml.Features that has this and other helpful things

Read more comments on GitHub >

github_iconTop Results From Across the Web

Consider providing w14:paraId and w14:textId generators
Microsoft Word uses a random number generator to generate the values (noting that is not a requirement). At the moment, the Paragraph (...
Read more >
Nested repeating tables in word template
Good afternoon all, I am looking for a way to present an array of arrays in a printable, human readable way. For example,...
Read more >
Best way to generate Microsoft Word docx from ABAP
There are several ways to generate Microsoft Word docx documents using ABAP. All of them have a number of disadvantages:.
Read more >
Corrupted file when creating Word document - java
I ran this exact code, using the current stable release of docx4j (v3.1) with no issues. A document was created and opened just...
Read more >
Reading and writing Microsoft Word docx files with Python
In this post, I'll describe the structure of this file format and how to access it easily in python. I've also used these...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found