question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Start-Transcript uses lossy ASCII character encoding instead of BOM-less UTF-8 if the target file happens to exist

See original GitHub issue

Note:

  • Windows PowerShell is not affected - UTF-8 with BOM is used consistently there.
  • A fix for the closely related #13678 could probably encompass a fix for this issue as well.

When the target file happens to exist and doesn’t have a BOM - even without -Append - Start-Transcript mistakenly uses ASCII encoding to write the file.

Without -Append, Start-Transcript shouldn’t even look at the existing file - it should simply replace it, and use the default encoding (BOM-less UTF-8). See #13678 for a closely related bug with -Append.

Instead, Start-Transcript apparently looks for a BOM at the start of an existing file and uses ASCII encoding if it doesn’t find one.

Steps to reproduce

# To surface the bug, make sure that the target file exists and doesn't have a BOM.
$null > temp:/$PID.txt

$null = Start-Transcript temp:/$PID.txt
'ü' # output a string with a non-ASCII-range character
$null = Stop-Transcript

Select-String -Quiet 'ü' temp:/$PID.txt | Should -BeTrue

Remove-Item temp:/$PID.txt

Expected behavior

The test should succeed.

Actual behavior

The test fails, because ü was transliterated to verbatim ?, suggesting that ASCII encoding was used to write the file.

Environment data

PowerShell Core 7.1.0-preview.7

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:24 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
Gimlycommented, Oct 26, 2020

@mklement0 I’ve opened a PR that, I think, fixes this issue. It’s quite simple as I think the only part that was not working as you described was that the “clear file” that was done when the -Append flag isn’t set wasn’t fixing a specific encoding, but simply using the one that existed.

Now I’ve fixed the encoding (using the static readonly instance present in Utils) to UTF8 No Bom, as we had done in the PR #13732.

1reaction
mklement0commented, Oct 1, 2020

Thanks for tackling this, @Gimly - I haven’t even looked at the original code yet, but @iSazonov has, so perhaps he has thoughts.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using PowerShell to write a file in UTF-8 without the BOM
Yes, -Encoding ASCII avoids the BOM problem, but you obviously only get support for 7-bit ASCII characters. Given that ASCII is a subset...
Read more >
about Character Encoding - PowerShell
Send-MailMessage uses Ascii encoding by default. Start-Transcript creates Utf8 files with a BOM. When the Append parameter is used, the encoding ...
Read more >
Question: Please Explain Character Encoding
I'm trying to figure out how character encoding works in Boomi. I have a process that only works if I set the atom...
Read more >
BibDesk Help: F. Character Encodings
The UTF-8 encoding is a superset of ASCII, and is a more modern choice that supports extended character sets; it also works with...
Read more >
UTF-8 Applications | Girders: the blog of Allen Fair
ASCII is essentially a binary encoding in that a string was a sequence of bytes, and there was no “invalid” value. UTF-8 is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found