question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Black formatting is not applied (but black runs)

See original GitHub issue

Issue Type: Bug

Behaviour

When formatting a *.py file with black, no changes are applied. I can see in the log black being run without error but no changes are applied to my file. Output of python after formatting:

~\.local\bin\black.exe --diff --quiet .\test.py
cwd: .

when I run that same command in terminal, I can see the output of black which seems correct:

PS C:\Users\Me\Path>~\.local\bin\black.exe --diff --quiet .\test.py
--- test.py     2022-02-23 00:07:09.493174 +0000
+++ test.py     2022-02-23 00:07:30.533624 +0000
@@ -1,2 +1,2 @@
-print( "test")
-print(3+8/3)
+print("test")
+print(3 + 8 / 3)

I am using the following settings:

{
    "python.formatting.provider": "black",
    "python.formatting.blackPath": "${env:USERPROFILE}\\.local\\bin\\black.exe",
    "[python]": {
        "editor.formatOnSave": true,
        "editor.codeActionsOnSave": {
            "source.organizeImports": true
        },
        "editor.defaultFormatter": "ms-python.python",
    },
    ...
}

(black is installed through pipx) Running ~\.local\bin\black.exe .\test.py does format my file correctly. Is it because of the --diff arg vscode give to black, how to remove it?

This setup was working last month for sure but now something seems to have changed and cannot figure out… I am so confused

Thanks for the help

Diagnostic data

  • Python version (& distribution if applicable, e.g. Anaconda): 3.8.10
  • Type of virtual environment used (e.g. conda, venv, virtualenv, etc.): Global
  • Value of the python.languageServer setting: Pylance
User Settings


defaultLS: {"defaultLSType":"Pylance"}

downloadLanguageServer: true

envFile: "<placeholder>"

venvPath: "<placeholder>"

venvFolders: "<placeholder>"

condaPath: "<placeholder>"

pipenvPath: "<placeholder>"

poetryPath: "<placeholder>"

languageServer: "Pylance"

linting
• enabled: true
• cwd: "<placeholder>"
• Flake8Args: "<placeholder>"
• flake8Enabled: true
• flake8Path: "<placeholder>"
• lintOnSave: true
• banditArgs: "<placeholder>"
• banditEnabled: true
• banditPath: "<placeholder>"
• mypyArgs: "<placeholder>"
• mypyEnabled: true
• mypyPath: "<placeholder>"
• pycodestyleArgs: "<placeholder>"
• pycodestyleEnabled: false
• pycodestylePath: "<placeholder>"
• prospectorArgs: "<placeholder>"
• prospectorEnabled: false
• prospectorPath: "<placeholder>"
• pydocstyleArgs: "<placeholder>"
• pydocstyleEnabled: false
• pydocstylePath: "<placeholder>"
• pylamaArgs: "<placeholder>"
• pylamaEnabled: false
• pylamaPath: "<placeholder>"
• pylintArgs: "<placeholder>"
• pylintPath: "<placeholder>"

sortImports
• args: "<placeholder>"
• path: "<placeholder>"

formatting
• autopep8Args: "<placeholder>"
• autopep8Path: "<placeholder>"
• provider: "black"
• blackArgs: "<placeholder>"
• blackPath: "<placeholder>"
• yapfArgs: "<placeholder>"
• yapfPath: "<placeholder>"

testing
• cwd: "<placeholder>"
• debugPort: 3000
• nosetestArgs: "<placeholder>"
• nosetestsEnabled: undefined
• nosetestPath: "<placeholder>"
• promptToConfigure: true
• pytestArgs: "<placeholder>"
• pytestEnabled: false
• pytestPath: "<placeholder>"
• unittestArgs: "<placeholder>"
• unittestEnabled: false
• autoTestDiscoverOnSaveEnabled: true

terminal
• activateEnvironment: true
• executeInFileDir: "<placeholder>"
• launchArgs: "<placeholder>"

experiments
• enabled: true
• optInto: []
• optOutFrom: []

insidersChannel: "off"

tensorBoard
• logDirectory: "<placeholder>"

Extension version: 2022.0.1814523869 VS Code version: Code 1.64.2 (f80445acd5a3dadef24aa209168452a3d97cc326, 2022-02-09T22:02:28.252Z) OS version: Windows_NT x64 10.0.19042 Restricted Mode: No

System Info
Item Value
CPUs Intel® Core™ i5-8365U CPU @ 1.60GHz (8 x 1896)
GPU Status 2d_canvas: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
oop_rasterization: enabled
opengl: enabled_on
rasterization: enabled
skia_renderer: enabled_on
video_decode: enabled
vulkan: disabled_off
webgl: enabled
webgl2: enabled
Load (avg) undefined
Memory (System) 15.77GB (5.22GB free)
Process Argv
Screen Reader no
VM 0%
A/B Experiments
vsliv368:30146709
vsreu685:30147344
python383:30185418
vspor879:30202332
vspor708:30202333
vspor363:30204092
pythonvspyl392cf:30425750
pythontb:30283811
pythonvspyt551:30345470
pythonptprofiler:30281270
vshan820:30294714
vstes263:30335439
pythondataviewer:30285071
vscod805:30301674
pythonvspyt200:30340761
binariesv615:30325510
bridge0708:30335490
bridge0723:30353136
vsaa593:30376534
vsc1dst:30438360
pythonvs932:30410667
wslgetstartedc:30433508
vsclayoutctrc:30437038
vsrem710:30416614
dsvsc008:30440022
pythonvsnew555cf:30442237
vsbas813:30436447
vscscmwlcmt:30438805
helix:30440343

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:15 (3 by maintainers)

github_iconTop GitHub Comments

16reactions
mbikovitskycommented, Feb 28, 2022

I did some investigating, and the TL;DR is that this isn’t a bug in either VSCode or Black, and technically not even in pipx. Rather, this is a bug in a component of distlib, which is used to create the black.exe launcher.

I’ll open an issue there in a bit, but I just wanted to share the whole story here, as it’s quite beautiful 🥲.

Update 2022-02-28: Issue opened.

Note: I’ll be using Python 3.10 in the analysis below, as that’s what I have, but the general ideas should apply to other versions as well.

Chapter 1: Leaving A Comment Doesn’t Make It So

If we look at the exit code of black.exe when running it from VSCode (e.g. using Procmon), we can see that it is -1073741819, which is 0xC0000005, i.e. STATUS_ACCESS_VIOLATION. This means that it probably crashes on some invalid memory access. Indeed, if we attach WinDbg to the VSCode extension host process, set .childdbg 1, and continue execution, we’ll see that it crashes inside python.exe, in ucrtbase!dup_nolock. This process is spawned indirectly by black.exe, and is the actual interpreter that runs Black. The exception code is then propagated as the process exit code all the way to black.exe. We’ll get to the complete process chain later on.

Lucky for us, the Python devs provide private debugging symbols, so we can recover the full call stack with line numbers:

ucrtbase!dup_nolock+0x159
ucrtbase!dup+0x85
python310!is_valid_fd+0x2f [D:\_w\1\s\Python\pylifecycle.c @ 2124] 
python310!create_stdio+0x39 [D:\_w\1\s\Python\pylifecycle.c @ 2154] 
python310!init_sys_streams+0x1a7 [D:\_w\1\s\Python\pylifecycle.c @ 2370] 
python310!init_interp_main+0x10c [D:\_w\1\s\Python\pylifecycle.c @ 1120] 
python310!pyinit_main+0x34 [D:\_w\1\s\Python\pylifecycle.c @ 1196] 
python310!Py_InitializeFromConfig+0x73 [D:\_w\1\s\Python\pylifecycle.c @ 1227] 
python310!pymain_init+0x132 [D:\_w\1\s\Modules\main.c @ 66] 
python310!pymain_main+0x11 [D:\_w\1\s\Modules\main.c @ 687] 
python310!Py_Main+0x25 [D:\_w\1\s\Modules\main.c @ 709] 
python!invoke_main+0x22 [d:\a01\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90] 
python!__scrt_common_main_seh+0x10c [d:\a01\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
KERNEL32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21

The init_sys_streams function is responsible for initializing the sys.stdin/stdout/stderr objects in Python. It does so by associating them with the underlying fds of C’s stdin, stdout, and stderr FILE * objects, which it obtains using the fileno function.

The is_valid_fd function checks that a given file descriptor is valid by dup-ing it. If the dup succeeds, then the fd must be valid. In our case, the offending fd is 2, i.e. stderr.

Sidenote about fds: the Microsoft C Runtime “emulates” POSIX-like file descriptors so that functions like read and write and fileno and dup still work as expected. Internally, there is a mapping between a file descriptor and a HANDLE that implements it (this HANDLE can be obtained via a call to _get_osfhandle). This mapping is what allows code to specify 0 in a call to read and have it read from stdin. We’ll get to how this mapping is initialized and maintained in a bit.

Back to the story. In order for dup to actually duplicate an fd, it has to duplicate the underlying handle. Indeed, if we look at the code, that’s exactly what it does. Unfortunately, in our case this fails with ERROR_INVALID_HANDLE. If we look at all of the process’ handles, we can also see that the handle for fd 2 (stderr) doesn’t exist, so no wonder that duplication fails.

Furthermore, the CRT has a bug in the duplication code:

bool success = false;
__try
{
    BOOL const result = DuplicateHandle(...);

    if (!result)
    {
        ...
        new_fh = -1;
        __leave;
    }

    ...
}
__finally
{
    if (!success)
    {
#pragma warning(disable:__WARNING_BUFFER_UNDERFLOW) // 26001 new_fh can't be -1 here
        _osfile(new_fh) &= ~FOPEN;
    }

    ...
}

Narrator voice: new_fh was -1 there.

If DuplicateHandle fails, new_fh is set to -1, and so the line _osfile(new_fh) &= ~FOPEN accesses the fd array out of bounds.

Note: We’re talking about version 10.0.19041.0 of the UCRT. AFAICT this has been fixed in newer versions. However, the version you have depends on Windows Update, so you may be out of luck.

Chapter 2: Left Out Of The Inheritance

Alright, so this explains why Black doesn’t return anything, as it just crashes during the Python interpreter’s initialization. But how come fd 2 is associated with a nonexistent handle? To understand that, we need to look into how the standard I/O fds are initialized at process startup.

The initialization flow is described rather well in the code itself. For our analysis the important part is this: on startup, the CRT obtains the STARTUPINFOW structure used to create the process (via GetStartupInfoW), and looks at the lpReserved2 field in it. This field points to a map from fds to handles, which the CRT uses to initialize its internal fd table. The assumption here is that lpReserved2 was initialized by the parent process, and that the handles within were inherited from it. This makes it possible to use functions such as spawn and exec and have the child process inherit fds, just as it would be on POSIX.

Crucially, the CRT does try to validate each handle in the lpReserved2 array by calling GetFileType. If it returns FILE_TYPE_UNKNOWN, the CRT doesn’t use the handle. However, this check is skipped for pipes, so a garbage value can still sneak in (foreshadowing).

Now, if we look at the handle table for black.exe, we’ll find there a handle with the same value that was assigned to fd 2 in the crashing python.exe. Indeed, we’ll also find there the handles assigned to fds 0 and 1 (stdin and stdout). Moreover, all three handles are specified in the hStdInput, hStdOutput, and hStdError fields in the STARTUPINFOW structure used to start black.exe.

Except, the stderr handle is not inheritable! Something in black.exe must have marked the handle as non-inheritable before spawning the child process, but didn’t inform the CRT of this. So the CRT passed the fd on to the child, but the handle was left behind in the parent.[^python-issue]

Indeed, setting a breakpoint on SetHandleInformation in black.exe reveals that it marks the stderr handle as non-inheritable before calling CreateProcessW.

But what is black.exe? WinDbg shows its image name as t64.exe, which is an important clue. t64.exe is present in the pip package as part of distlib (in site-packages/pip/_vendor/distlib), however the distlib repo doesn’t have the source for t64.exe, only binaries. Further sleuthing reveals that the source is actually here. Read the description there for what the thing does, it’s quite nice.

For our purposes, however, the only interesting part is this:

memset(&si, 0, sizeof(si));
GetStartupInfoW(&si);
/*
 * See https://github.com/pypa/pip/issues/10444#issuecomment-973396812
 */
if ((si.dwFlags & (STARTF_USEHOTKEY | STARTF_UNDOC_MONITOR)) == 0) {
    HANDLE hIn = GetStdHandle(STD_INPUT_HANDLE);
    HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
    HANDLE hErr = GetStdHandle(STD_ERROR_HANDLE);

    ok = safe_duplicate_handle(hIn, &si.hStdInput);
    assert(ok, "stdin duplication failed");
    CloseHandle(hIn);

    ok = safe_duplicate_handle(hOut, &si.hStdOutput);
    assert(ok, "stdout duplication failed");
    CloseHandle(hOut);
    /* We might need stderr late, so don't close it but mark as non-inheritable */
    SetHandleInformation(hErr, HANDLE_FLAG_INHERIT, 0);

    ok = safe_duplicate_handle(hErr, &si.hStdError);
    assert(ok, "stderr duplication failed");
    si.dwFlags |= STARTF_USESTDHANDLES;
}
ok = CreateProcessW(NULL, cmdline, NULL, NULL, TRUE, 0, NULL, NULL, &si, &child_process_info);

This does several things:

  1. Obtain a copy of the STARTUPINFOW structure used to create the current process.
  2. Duplicate all standard I/O handles and put them into STARTUPINFOW.[^win-io-handles]
  3. Close the original I/O handles, except for the one for stderrr (which is marked non-inheritable instead).
  4. Launch a child process with the updated STARTUPINFOW.

However, the original STARTUPINFOW has a CRT array of fds in lpReserved2! And the handles therein are the ones the launcher just closed or marked as non-inheritable (which amounts to the same thing)! So the child gets garbage handle values.

There’s one other thing of note here: in the code above, each I/O handle is duplicated and then closed. Due to how Windows reuses handle values, this means that the next created handle will probably get the same value as the one that was just closed. So after running the code above we’ll most likely have:[^handle-reuse]

  • si.hStdInput == some value
  • si.hStdOutput == hIn
  • si.hStdError == hOut

This means that when the CRT goes to initialize its fds in the child process, stdin and stdout will have valid handles, just that they’d refer to the wrong objects. And, as mentioned previously, stderr will have a garbage handle value.

Chapter 3: A Series of Unfortunate Events

At this point we have enough information to piece together the series of events leading to the crash:

  1. VSCode launches black.exe and passes 3 handles for its stdin, stdout, and stderr - both in the hStdInput, hStdOutput, and hStdError fields and in the lpReserved2 map.
  2. black.exe duplicates the Windows I/O handles from hStdInput, hStdOutput, and hStdError (this is basically what GetStdHandle retrieves) and closes the originals.
  3. black.exe launches the child process with the duplicated handle values, but does not update the lpReserved2 map.
  4. The CRT in the child Python interpreter initializes its fd table from the lpReserved2 map, which at this point contains garbage. Since VSCode passes pipes in this array, the CRT does not validate the handles.
  5. The child Python interpreter goes to initialize its sys.std* streams with the fds 0-2. To do that, it validates each fd by trying to dup it:
    1. The dup for fd 0 succeeds, as it points to a valid handle (albeit the one for the standard output stream).
    2. The dup for fd 1 succeeds, as it points to a valid handle (albeit the one for the standard error stream).
    3. The dup for fd 2 fails the DuplicateHandle call as the handle doesn’t exist, and promptly crashes when indexing an array out of bounds.

We can now also explain why Black works when run from the command-line. cmd.exe simply doesn’t populate lpReserved2 when it calls CreateProcessW, so the CRT doesn’t get confused when it initializes its fd table.

And that’s about it. A story of undocumented structures, random chance, and ABIs you didn’t know you had to uphold.

Truly beautiful.

Appendix A: All My Sons

Actually, that’s not everything. There’s one other process in play when running Black. The full tree looks like this:

  • black.exe
    • python.exe
      • python.exe

black.exe we have already met. The last python.exe is the actual interpreter, where all the fun happens. The python.exe in the middle is a special launcher that resides in the virtualenv that pipx creates for Black. Its job is to set up the environment before calling the actual Python interpreter, so that it executes inside the virtualenv.

Why is this interesting? Because this launcher also performs the handle duplication dance. Except, in this case it doesn’t close the original handles. Since the originals were inherited from black.exe, they’ll be passed onto the final python.exe.

So the analysis above is still valid, we just end up with a couple more duplicate handles.

Appendix B: Well-Known Secrets

Recall how VSCode launches black.exe with both lpReserved2 and the usual STARTUPINFOW fields filled with handles. In an effort to reproduce the bug without VSCode, I looked for a C function that launches a process and populates the lpReserved2 field.

I found only one: spawn. Except this function sets only lpReserved2, but not the other STARTUPINFOW fields.

Upon further examination, it appears that VSCode uses libuv for launching the process. And look what I found there:

startup.dwFlags = STARTF_USESTDHANDLES | STARTF_USESHOWWINDOW;

startup.cbReserved2 = uv__stdio_size(process->child_stdio_buffer);
startup.lpReserved2 = (BYTE*) process->child_stdio_buffer;

startup.hStdInput = uv__stdio_handle(process->child_stdio_buffer, 0);
startup.hStdOutput = uv__stdio_handle(process->child_stdio_buffer, 1);
startup.hStdError = uv__stdio_handle(process->child_stdio_buffer, 2);

It would appear that the lpReserved2 is not so reserved. It’s not exactly risky to manually populate this field, since it’s practically part of the CRT ABI now, but it’s still surprising (for me) to see such a thing in a well-known library.

Appendix C: More Fun

Here’s a bit of C code that launches the command it receives in its arguments (just don’t put spaces in any of them):

#include <process.h>

int wmain(int argc, wchar_t ** argv)
{
    return _wspawnvp(_P_WAIT, argv[1], argv + 1);
}

This isn’t doing anything even remotely interesting, yet when you run black.exe with this wrapper:

Fatal Python error: init_sys_streams: can't initialize sys standard streams
Python runtime state: core initialized
ValueError: Cannot open console output buffer for reading

Current thread 0x0000de74 (most recent call first):
  <no Python frame>

Debugging this is left as an exercise for the reader 😎.

[^python-issue]: Fun fact: Python had a similar issue with handle inheritance and lpReserved2 in the past.

[^win-io-handles]: Note that these are Windows I/O handles, specified in the hStdInput, hStdOutput, and hStdError fields of STARTUPINFOW when creating a process. The CRT will use these to initialize fds 0-2 in the absence of an lpReserved2 map, or if the map doesn’t specify a handle for a particular fd in the range 0-2.

[^handle-reuse]: Most likely. I don’t claim to understand how Windows reuses handle values, but the behaviour described here was what I consistently observed during debugging.

1reaction
frostmingcommented, Mar 1, 2022

✔black installed in external dedicated venv, setting python.formatting.blackPath

pipx also sets up a dedicated venv, it is weird why this works but pipx doesn’t.

Maybe we can manually set up a venv under pipx’s location. I will try if I find a time.

It turns out manually created venv under pipx location works. There might be some difference in how pipx creates venvs and installs packages. @mbikovitsky 's investigation was great but it doesn’t explain OP’s testing result.


UPDATE

Yes, there are differences. The pipx uses a shared pip installation to install packages. To prove that makes a difference:

Normal venv with pip inside, works fine

python -m venv fine
fine\Scripts\python -m pip install black
# Set blackPath to fine\Scripts\black.exe

Thin venv without pip, but reference to pip outside, buggy. This is how pipx works

python -m venv bad --without-pip
python -m venv shared
# Put a `shared_pip.pth` under bad\Lib\site-packages containing the following line:
#     C:\absolute\path\to\shared\Lib\site-packages
bad\Scripts\python -m pip install black
# Set blackPath to bad\Scripts\black.exe

What’s more,fine\Scripts\black.exe and bad\Scripts\black.exe have different size.

So the current workaround is to create a dedicated venv and install black in it, then point blackPath to the executable.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Formatter black is not working on my VSCode...but why?
Just run from the command line if you need to format a lot of files at once. First, check if you have this...
Read more >
Frequently Asked Questions - Black 22.12.0 documentation
Black is an autoformatter, not a Python linter or interpreter. Detecting all syntax errors is not a goal. It can format all code...
Read more >
Python code formatter Black
Black requires Python 3.6.2+ to run but has a capability to format Python 2 code too. For Python 3.6.2+ using pip. pip install...
Read more >
Auto Format your Python Code with Black. | by Davis David
Black can reformat your file in place according to the Black code style. ... imported but unused”, “Undefined name” and codes which are...
Read more >
Editing Python in Visual Studio Code
black does not support formatting sections of code, it can be prevented with the following settings "[python]": {"editor.formatOnPaste": false, "editor.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found