$OutputEncoding and [Console]::InputEncoding are not aligned -Windows Only
See original GitHub issueThe $OutputEncoding
preference variable is not the same as the [Console]::InputEncoding
value. This can lead to a poor UX with the defaults as they are meant to be closely aligned.
Windows PowerShell is also affected byt this but in a different way. In Windows PowerShell $OutputEncoding
is strict ASCII and [Console]::InputEncoding
is aligned to the system default settings. Pure ASCII characters are fine but anything beyond the 7-bit range will become ?
when piped into a native application.
Steps to reproduce
The simplest way to reproduce this is if you have Python 3 installed.
'café' | python.exe -c "import sys; data = sys.stdin.read(); print(data)"
If you don’t have Python 3 installed then you can use the following PowerShell script
[CmdletBinding()]
param (
[Parameter(Mandatory)]
[string]
$Path, # The file the script should write the raw stdin bytes to.
[Parameter(Mandatory)]
[string]
$OutputData, # Base64 encoded bytes that the script should output to the stdout pipe.
# Raw = raw FileStream read and write with bytes
# .NET = [Console]::Read and [Console]::Write ($Path and $OutputData are treated as UTF-8)
[Parameter()]
[ValidateSet('Raw', '.NET')]
[string]
$Method = 'Raw',
[int]
$InputCodepage = $null,
[int]
$OutputCodepage = $null
)
Add-Type -TypeDefinition @'
using Microsoft.Win32.SafeHandles;
using System;
using System.Runtime.InteropServices;
namespace RawConsole
{
public class NativeMethods
{
[DllImport("Kernel32.dll")]
public static extern int GetConsoleCP();
[DllImport("Kernel32.dll")]
public static extern int GetConsoleOutputCP();
[DllImport("Kernel32.dll")]
public static extern SafeFileHandle GetStdHandle(
int nStdHandle);
[DllImport("Kernel32.dll")]
public static extern bool SetConsoleCP(
int wCodePageID);
[DllImport("Kernel32.dll")]
public static extern bool SetConsoleOutputCP(
int wCodePageID);
}
}
'@
$origInputCP = [RawConsole.NativeMethods]::GetConsoleCP()
$origOutputCP = [RawConsole.NativeMethods]::GetConsoleOutputCP()
if ($InputCodepage) {
[void][RawConsole.NativeMethods]::SetConsoleCP($InputCodepage)
}
if ($OutputCodepage) {
[void][RawConsole.NativeMethods]::SetConsoleOutputCP($OutputCodepage)
}
try {
$outputBytes = [Convert]::FromBase64String($OutputData)
$utf8NoBom = [Text.UTF8Encoding]::new($false)
if ($Method -eq 'Raw') {
$stdinHandle = [RawConsole.NativeMethods]::GetStdHandle(-10)
$stdinFS = [IO.FileStream]::new($stdinHandle, 'Read')
$stdoutHandle = [RawConsole.NativeMethods]::GetStdHandle(-11)
$stdoutFS = [IO.FileStream]::new($stdoutHandle, 'Write')
$inputRaw = [byte[]]::new(1024)
$inputRead = $stdinFS.Read($inputRaw, 0, $inputRaw.Length)
$outputFS = [IO.File]::Create($Path)
$outputFS.Write($inputRaw, 0, $inputRead)
$outputFS.Dispose()
$stdoutFS.Write($outputBytes, 0, $outputBytes.Length)
$stdinFS.Dispose()
$stdinHandle.Dispose()
$stdoutFS.Dispose()
$stdoutHandle.Dispose()
}
elseif ($Method -eq '.NET') {
$inputRaw = [Text.StringBuilder]::new()
while ($true) {
$char = [Console]::Read()
if ($char -eq -1) {
break
}
[void]$inputRaw.Append([char]$char)
}
[IO.File]::WriteAllText($Path, $inputRaw.ToString(), $utf8NoBom)
$outputString = $utf8NoBom.GetString($outputBytes)
[Console]::Write($outputString)
}
}
finally {
[void][RawConsole.NativeMethods]::SetConsoleCP($origInputCP)
[void][RawConsole.NativeMethods]::SetConsoleOutputCP($origOutputCP)
}
Call it like
$string = 'café'
$stringBytes = [Text.UTF8Encoding]::new($false).GetBytes($string)
$stringB64 = [Convert]::ToBase64String($stringBytes)
$null = $string | powershell.exe -NoLogo -File proc_io.ps1 -Path input -OutputData $stringB64 -Method .NET
Format-Hex -Path input
Expected behavior
For Python I expect café
to be returned back
For the manual PowerShell script I expect the input
file to contains café
as a UTF-8 encoded string.
Label: C:\temp\input
Offset Bytes Ascii
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
------ ----------------------------------------------- -----
0000000000000000 63 61 66 C3 A9 0D 0A caf�
Actual behavior
For Python
# Windows PowerShell
caf?
# PowerShell
café
This is because the $string = 'café
is encoded using the value of $OutputEncoding
when sent to the process’ stdin but the native process is using the default console input codepage (437 in my case).
For the PowerShell script
# Windows PowerShell
Path: C:\temp\input
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 63 61 66 3F 3F 0D 0A caf??..
# PowerShell
Label: C:\temp\input
Offset Bytes Ascii
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
------ ----------------------------------------------- -----
0000000000000000 63 61 66 E2 94 9C E2 8C 90 0D 0A caf����
Same problem here, .NET is reading the stdin based on the value of the console’s input codepage which does not match up with $OutputEncoding
. The only reason why there’s a difference is because Windows PowerShell is using ASCII and so the é
is being converted to ?
.
Environment data
Name Value
---- -----
PSVersion 7.1.0
PSEdition Core
GitCommitId 7.1.0
OS Microsoft Windows 10.0.17763
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
Other info
I’m currently trying to wrap my head around the behaviour with encoding and talking to native processes in PowerShell and this particular issue has come up. I’m not sure if we can change the default behaviour but I’m hoping to start a conversation as to why $OutputEncoding
was changed to utf-8 on Windows. For Linux I understand but I feel like on Windows this should stay as the console’s input codepage for better compatibility with other cmdline programs.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:12 (8 by maintainers)
Top GitHub Comments
To summarize the backward-compatibility impact that the proposed change would have:
If we switch PowerShell consoles from the system’s OEM code page to
65001
== UTF-8, only the following cases will amount to a breaking change:65001
).65001
(i.e. UTF-8) - an example of such a program isWMIC.exe
, which is deprecated, however.For “rogue” programs that do their own thing (e.g., Node.js, Python,
sfc.exe
- though note that Python can be configured to use UTF-8), a workaround is already necessary (temporarily settingConsole]::OutputEncoding
), and it will continue to work.Having revisited this in the context of #15123:
Sorry for not fully addressing your feedback: I didn’t dive as deep as you did.
First of all, yes: whenever stdout isn’t connected to a console, the discrepancy between seemingly correct display output but erroneously decoded output doesn’t apply, because decoding is then invariably involved (capturing in a variable, redirecting to a file, piping to another command, remoting, background/thread jobs).
As for when this discrepancy may arise on Windows:
At least some high-profile CLIs seemingly explicitly modify their behavior based on whether they’re outputting directly to a console (terminal) or not.
My inference was that such CLIs use the Unicode version of the
WriteConsole
API situationally, namely when stdout is connected to a console, but I have no positive proof.The two high-profile CLIs I know of that behave this way are
python
andnode
- there may be others.By contrast, .NET console applications do not make this distinction when outputting via
Console.WriteLine()
/Console.Out.WriteLine()
You can verify this as follows:
Write-Output
:As you can see, only with decoded output do the encoding-mismatch problems surface in the
python
(always-ANSI) andnode
(always UTF-8) calls.