question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RangeError: premature EOF for Unicode character U+FEFF on start

See original GitHub issue

Hello,

We’ve noticed that whenever the Unicode ZWNBSP character(U+FEFF) is received on start of the string of a message, it throws a silent error and omits the character in the decoded part. It seems that, this particular execution of TextDecoder.decode throws the mentioned error:

https://github.com/timostamm/protobuf-ts/blob/aaa63c7168cfae84a64e72c4f379017d3d1919b8/packages/runtime/src/binary-reader.ts#L245

I’ve created a repository to reproduce:

https://github.com/kivancguckiran/premature-eof-protobuf-ts

Outputted the charcodes from the result of the create operation and after fromBinary operation. If the U+FEFF character is in the start, it is ommited from decoded part.

Since it is Zero-Width-No-Break-Space, github preview hides the mentioned unicode character.

This line is: https://github.com/kivancguckiran/premature-eof-protobuf-ts/blob/main/index.ts#L4

Actually like this: resim

Thanks in advance.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
timostammcommented, Aug 17, 2022

Released in v2.8.0.

1reaction
jcreadycommented, Jul 1, 2022

I can’t seem to find any information about whether or not the BOM should be ignored or not in protobuf strings. They’re definitely ignored when protoc is reading a proto file to compile, but as far as the actual runtimes go I see no tests regarding the BOM in string field values.

I think the correct thing would be to update protobuf-ts to use the ignoreBOM setting, but I can’t be certain without seeing an existing test or docs. For an immediate workaround what you have is basically what I would’ve recommended, but you should only need to create the TextDecoder instance once.

// shared-binary-read-options.ts
import { BinaryReader, BinaryReadOptions } from '@protobuf-ts/runtime';

const textDecoder = new TextDecoder('utf-8', { fatal: true, ignoreBOM: true });
export const binaryReadOptions: Partial<BinaryReadOptions> = {
    readerFactory: (bytes) => new BinaryReader(bytes, textDecoder)
};
// some-other-file.ts
import { binaryReadOptions } from './shared-binary-read-options';

// Later
Test.fromBinary(data, binaryReadOptions);

I wouldn’t recommend the following approach, but you can monkey-patch the BinaryReader prototype so that you can avoid needing to import and pass the options everywhere. Just note that you will need to execute (import) this code once before calling fromBinary() to be effective.

// monkey-patch-protobuf-ts-binary-reader.ts
import { BinaryReader } from '@protobuf-ts/runtime';

const textDecoder = new TextDecoder('utf-8', { fatal: true, ignoreBOM: true });

function monkeyPatchedString(): string {
    return textDecoder.decode(this.bytes());
}

// @ts-ignore
BinaryReader.prototype.string = monkeyPatchedString;
Read more comments on GitHub >

github_iconTop Results From Across the Web

unicode - u'\ufeff' in Python string
The Unicode character U+FEFF is the byte order mark, or BOM, and is used to tell the difference between big- and little-endian UTF-16...
Read more >
Premature EOF on Unicode (UTF-8 & UTF-16) Input #93
A simple RE-flex program (consisting of two lines of "%%") is unable to copy Unicode files containing the character 0x041A (UTF-16 encoding) ...
Read more >
Illegal Character \ufeff Problem [LSS Wiki]
During a start-up of a StarCCM+ run on Neumann, i encountered a problem. The macro was copyied/send from a Windows machine to linux(Neumann)....
Read more >
Byte order mark
The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFF BYTE ORDER MARK, whose appearance as a magic...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found