Consider providing w14:paraId and w14:textId generators
See original GitHub issueDescription
w:p
and w:tr
elements can have the w14:paraId
and w14:textId
attributes, which are defined in MS-DOCX as ST_LongHexNumber
values that are unique within the document part as well as greater than 0
and less than 0x80000000
.
Microsoft Word uses a random number generator to generate the values (noting that is not a requirement).
At the moment, the Paragraph
(w:p
) and TableRow
(w:tr
) classes do not generate values for the ParagraphId
(w14:paraId
) and TextId
(w14:textId
) attributes. There are also no utility classes or methods for producing compliant values.
Therefore, the question is whether we want to offer any functionality for generating or validating those attribute values.
Providing utility methods would be very straightforward. For example, here is the code (taken from two classes) that I am using in my codebase for creating random ST_LongHexNumber
values (optionally making sure they are less than 0x80000000
while always guaranteeing they are greater than 0
):
private static readonly RNGCryptoServiceProvider Generator = new RNGCryptoServiceProvider();
/// <summary>
/// Creates an ST_LongHexNumber value, masking the most significant byte with
/// the given <paramref name="msbMask" />.
/// </summary>
/// <param name="msbMask">The most significant byte mask.</param>
public static string CreateRandomLongHexNumber(byte msbMask = 0xff)
{
// Create a four-byte random number, noting that the first byte (data[0])
// will become the most significant byte in the string value created by
// the ToHexString() method.
var data = new byte[4];
Generator.GetNonZeroBytes(data);
data[0] &= msbMask;
return data.ToHexString();
}
/// <summary>
/// Converts the given value into a hexadecimal string, with the first
/// byte in the list being the most significant byte in the resulting
/// string.
/// </summary>
/// <param name="source">The list of bytes to be converted.</param>
/// <returns>A hexadecimal string.</returns>
public static string ToHexString(this IReadOnlyList<byte> source)
{
var dest = new char[source.Count * 2];
var i = 0;
var j = 0;
while (i < source.Count)
{
byte b = source[i++];
dest[j++] = ToCharUpper(b >> 4);
dest[j++] = ToCharUpper(b);
}
return new string(dest);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static char ToCharUpper(int value)
{
value &= 0xF;
value += '0';
if (value > '9')
{
value += ('A' - ('9' + 1));
}
return (char) value;
}
A first step could be to provide utility or extension methods without changing the Paragraph
and TableRow
classes.
A second, optional step could be to enhance the Paragraph
and TableRow
classes by adding instance methods like the following (noting that I have not put much thought into this yet and using the Paragraph
class as an example):
// Normal setter methods.
public void SetRandomParagraphId();
public void SetRandomTextId();
// Methods that would be handy for pure functional transformation scenarios.
// Like the With() method we added earlier.
public Paragraph WithNewRandomParagraphId();
public Paragraph WithNewRandomTextId();
Information
- .NET Target: all
- DocumentFormat.OpenXml Version: latest
Issue Analytics
- State:
- Created 2 years ago
- Comments:17 (5 by maintainers)
The one thing right now that makes me hesitate is conflicts. Consumers of the SDK can emit paraId/TextId that are not necessarily random as Word does but as long as they conform to the boundaries/rules documented including that they should not conflict with other paraId (document-wide uniqueness). Generating id’s is quick as @ThomasBarnekow and @rmboggs showed, however, checking for conflict may not be. The main problem would be the time/processing during construction to check a whole document for conflicts. for large documents, that could be undesirable. And although Word doesn’t fail opening these, 1) they will likely be replaced on save and 2) we don’t know if there will be negative side effects before they are replaced. So perhaps if we can show that adding unique and non-conflicting paraId’s can be done in a performant way, I would be more likely to agree. Having said that, Word does add these while checking for conflicts, but it’s a large application working in memory with binary representations of lots of collections which is likely more efficient.
There’s a different package: DocumentFormat.OpenXml.Features that has this and other helpful things