Start-Transcript uses lossy ASCII character encoding instead of BOM-less UTF-8 if the target file happens to exist
See original GitHub issueNote:
- Windows PowerShell is not affected - UTF-8 with BOM is used consistently there.
- A fix for the closely related #13678 could probably encompass a fix for this issue as well.
When the target file happens to exist and doesn’t have a BOM - even without -Append - Start-Transcript mistakenly uses ASCII encoding to write the file.
Without -Append, Start-Transcript shouldn’t even look at the existing file - it should simply replace it, and use the default encoding (BOM-less UTF-8). See #13678 for a closely related bug with -Append.
Instead, Start-Transcript apparently looks for a BOM at the start of an existing file and uses ASCII encoding if it doesn’t find one.
Steps to reproduce
# To surface the bug, make sure that the target file exists and doesn't have a BOM.
$null > temp:/$PID.txt
$null = Start-Transcript temp:/$PID.txt
'ü' # output a string with a non-ASCII-range character
$null = Stop-Transcript
Select-String -Quiet 'ü' temp:/$PID.txt | Should -BeTrue
Remove-Item temp:/$PID.txt
Expected behavior
The test should succeed.
Actual behavior
The test fails, because ü was transliterated to verbatim ?, suggesting that ASCII encoding was used to write the file.
Environment data
PowerShell Core 7.1.0-preview.7
Issue Analytics
- State:
- Created 3 years ago
- Comments:24 (14 by maintainers)
Top Results From Across the Web
Using PowerShell to write a file in UTF-8 without the BOM
Yes, -Encoding ASCII avoids the BOM problem, but you obviously only get support for 7-bit ASCII characters. Given that ASCII is a subset...
Read more >about Character Encoding - PowerShell
Send-MailMessage uses Ascii encoding by default. Start-Transcript creates Utf8 files with a BOM. When the Append parameter is used, the encoding ...
Read more >Question: Please Explain Character Encoding
I'm trying to figure out how character encoding works in Boomi. I have a process that only works if I set the atom...
Read more >BibDesk Help: F. Character Encodings
The UTF-8 encoding is a superset of ASCII, and is a more modern choice that supports extended character sets; it also works with...
Read more >UTF-8 Applications | Girders: the blog of Allen Fair
ASCII is essentially a binary encoding in that a string was a sequence of bytes, and there was no “invalid” value. UTF-8 is...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@mklement0 I’ve opened a PR that, I think, fixes this issue. It’s quite simple as I think the only part that was not working as you described was that the “clear file” that was done when the
-Appendflag isn’t set wasn’t fixing a specific encoding, but simply using the one that existed.Now I’ve fixed the encoding (using the static readonly instance present in Utils) to UTF8 No Bom, as we had done in the PR #13732.
Thanks for tackling this, @Gimly - I haven’t even looked at the original code yet, but @iSazonov has, so perhaps he has thoughts.