question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

$OutputEncoding and [Console]::InputEncoding are not aligned -Windows Only

See original GitHub issue

The $OutputEncoding preference variable is not the same as the [Console]::InputEncoding value. This can lead to a poor UX with the defaults as they are meant to be closely aligned.

Windows PowerShell is also affected byt this but in a different way. In Windows PowerShell $OutputEncoding is strict ASCII and [Console]::InputEncoding is aligned to the system default settings. Pure ASCII characters are fine but anything beyond the 7-bit range will become ? when piped into a native application.

Steps to reproduce

The simplest way to reproduce this is if you have Python 3 installed.

'café' | python.exe -c "import sys; data = sys.stdin.read(); print(data)"

If you don’t have Python 3 installed then you can use the following PowerShell script

[CmdletBinding()]
param (
    [Parameter(Mandatory)]
    [string]
    $Path,  # The file the script should write the raw stdin bytes to.

    [Parameter(Mandatory)]
    [string]
    $OutputData,  # Base64 encoded bytes that the script should output to the stdout pipe.
    
    # Raw = raw FileStream read and write with bytes
    # .NET = [Console]::Read and [Console]::Write ($Path and $OutputData are treated as UTF-8)
    [Parameter()]
    [ValidateSet('Raw', '.NET')]
    [string]
    $Method = 'Raw',

    [int]
    $InputCodepage = $null,

    [int]
    $OutputCodepage = $null
)

Add-Type -TypeDefinition @'
using Microsoft.Win32.SafeHandles;
using System;
using System.Runtime.InteropServices;

namespace RawConsole
{
    public class NativeMethods
    {
        [DllImport("Kernel32.dll")]
        public static extern int GetConsoleCP();

        [DllImport("Kernel32.dll")]
        public static extern int GetConsoleOutputCP();

        [DllImport("Kernel32.dll")]
        public static extern SafeFileHandle GetStdHandle(
            int nStdHandle);

        [DllImport("Kernel32.dll")]
        public static extern bool SetConsoleCP(
            int wCodePageID);

        [DllImport("Kernel32.dll")]
        public static extern bool SetConsoleOutputCP(
            int wCodePageID);
    }
}
'@

$origInputCP = [RawConsole.NativeMethods]::GetConsoleCP()
$origOutputCP = [RawConsole.NativeMethods]::GetConsoleOutputCP()

if ($InputCodepage) {
    [void][RawConsole.NativeMethods]::SetConsoleCP($InputCodepage)
}
if ($OutputCodepage) {
    [void][RawConsole.NativeMethods]::SetConsoleOutputCP($OutputCodepage)
}

try {    
    $outputBytes = [Convert]::FromBase64String($OutputData)
    $utf8NoBom = [Text.UTF8Encoding]::new($false)
    
    if ($Method -eq 'Raw') {
        $stdinHandle = [RawConsole.NativeMethods]::GetStdHandle(-10)
        $stdinFS = [IO.FileStream]::new($stdinHandle, 'Read')
    
        $stdoutHandle = [RawConsole.NativeMethods]::GetStdHandle(-11)
        $stdoutFS = [IO.FileStream]::new($stdoutHandle, 'Write')
    
        $inputRaw = [byte[]]::new(1024)
        $inputRead = $stdinFS.Read($inputRaw, 0, $inputRaw.Length)
        $outputFS = [IO.File]::Create($Path)
        $outputFS.Write($inputRaw, 0, $inputRead)
        $outputFS.Dispose()
        
        $stdoutFS.Write($outputBytes, 0, $outputBytes.Length)
    
        $stdinFS.Dispose()
        $stdinHandle.Dispose()
        
        $stdoutFS.Dispose()
        $stdoutHandle.Dispose()
    }
    elseif ($Method -eq '.NET') {
        $inputRaw = [Text.StringBuilder]::new()
        while ($true) {
            $char = [Console]::Read()
            if ($char -eq -1) {
                break
            }
    
            [void]$inputRaw.Append([char]$char)
        }
        [IO.File]::WriteAllText($Path, $inputRaw.ToString(), $utf8NoBom)
    
        $outputString = $utf8NoBom.GetString($outputBytes)
        [Console]::Write($outputString)
    }    
}
finally {
    [void][RawConsole.NativeMethods]::SetConsoleCP($origInputCP)
    [void][RawConsole.NativeMethods]::SetConsoleOutputCP($origOutputCP)
}

Call it like

$string = 'café'
$stringBytes = [Text.UTF8Encoding]::new($false).GetBytes($string)
$stringB64 = [Convert]::ToBase64String($stringBytes)

$null = $string | powershell.exe -NoLogo -File proc_io.ps1 -Path input -OutputData $stringB64 -Method .NET

Format-Hex -Path input

Expected behavior

For Python I expect café to be returned back

For the manual PowerShell script I expect the input file to contains café as a UTF-8 encoded string.

   Label: C:\temp\input

          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000000 63 61 66 C3 A9 0D 0A                            caf�

Actual behavior

For Python

# Windows PowerShell
caf?

# PowerShell
café

This is because the $string = 'café is encoded using the value of $OutputEncoding when sent to the process’ stdin but the native process is using the default console input codepage (437 in my case).

For the PowerShell script

# Windows PowerShell

           Path: C:\temp\input

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   63 61 66 3F 3F 0D 0A                             caf??..

# PowerShell

   Label: C:\temp\input

          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000000 63 61 66 E2 94 9C E2 8C 90 0D 0A                caf����

Same problem here, .NET is reading the stdin based on the value of the console’s input codepage which does not match up with $OutputEncoding. The only reason why there’s a difference is because Windows PowerShell is using ASCII and so the é is being converted to ?.

Environment data

Name                           Value
----                           -----
PSVersion                      7.1.0
PSEdition                      Core
GitCommitId                    7.1.0
OS                             Microsoft Windows 10.0.17763
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Other info

I’m currently trying to wrap my head around the behaviour with encoding and talking to native processes in PowerShell and this particular issue has come up. I’m not sure if we can change the default behaviour but I’m hoping to start a conversation as to why $OutputEncoding was changed to utf-8 on Windows. For Linux I understand but I feel like on Windows this should stay as the console’s input codepage for better compatibility with other cmdline programs.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
mklement0commented, Apr 7, 2021

To summarize the backward-compatibility impact that the proposed change would have:

If we switch PowerShell consoles from the system’s OEM code page to 65001 == UTF-8, only the following cases will amount to a breaking change:

  • Legacy programs that use a fixed OEM code page (other than 65001).
  • Legacy program that respect the console’s code page but cannot handle code page 65001 (i.e. UTF-8) - an example of such a program is WMIC.exe, which is deprecated, however.

For “rogue” programs that do their own thing (e.g., Node.js, Python, sfc.exe - though note that Python can be configured to use UTF-8), a workaround is already necessary (temporarily setting Console]::OutputEncoding), and it will continue to work.

0reactions
mklement0commented, Mar 31, 2021

Having revisited this in the context of #15123:

Sorry for not fully addressing your feedback: I didn’t dive as deep as you did.

First of all, yes: whenever stdout isn’t connected to a console, the discrepancy between seemingly correct display output but erroneously decoded output doesn’t apply, because decoding is then invariably involved (capturing in a variable, redirecting to a file, piping to another command, remoting, background/thread jobs).

As for when this discrepancy may arise on Windows:

  • At least some high-profile CLIs seemingly explicitly modify their behavior based on whether they’re outputting directly to a console (terminal) or not.

    • My inference was that such CLIs use the Unicode version of the WriteConsole API situationally, namely when stdout is connected to a console, but I have no positive proof.

    • The two high-profile CLIs I know of that behave this way are python and node - there may be others.

  • By contrast, .NET console applications do not make this distinction when outputting via Console.WriteLine() / Console.Out.WriteLine()

You can verify this as follows:

  • Direct-to-display output:
# On Windows - using direct-to-display output.
PS> & { python -c "print('eé')"; node -pe "'eé'"; (Add-Type -PassThru -Name foo -MemberDefinition 'public static void PrintToConsole() { Console.WriteLine("eé"); }')::PrintToConsole() }
eé
eé
eé
  • Decoded output, via piping to Write-Output:
# On Windows - *decoded* output
PS> & { python -c "print('eé')"; node -pe "'eé'"; (Add-Type -PassThru -Name foo -MemberDefinition 'public static void PrintToConsole() { Console.WriteLine("eé"); }')::PrintToConsole() } | 
      Write-Output
eΘ
eé
eé

As you can see, only with decoded output do the encoding-mismatch problems surface in the python (always-ANSI) and node (always UTF-8) calls.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Windows PowerShell utf-8 encoding
You may want to try using the command: $OutputEncoding = [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8 and see if that helps.
Read more >
C# UTF8 Reading/Outputting
What you're seeing is a bug in the Windows console. Fortunately, it only affects raster fonts. ... OutputEncoding = Encoding.Unicode;.
Read more >
windows 10 - Change OutputEncoding and code page
After testing it seems like powershell likes to work with BOM encoded UTF8 files. If your files are not BOM encoded you can...
Read more >
Thread: ∞ in console with Server 2019?
Try setting the Console.OutputEncoding and probably also the Console.InputEncoding to UTF-8 or UTF-16. I've found UTF-16 to be more reliable ...
Read more >
Encoding
First thing to note here is the question, what input encoding the editor applied to the string literals. Since the first snippet uses...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found