Black formatting is not applied (but black runs)
See original GitHub issueIssue Type: Bug
Behaviour
When formatting a *.py file with black, no changes are applied. I can see in the log black being run without error but no changes are applied to my file. Output of python after formatting:
~\.local\bin\black.exe --diff --quiet .\test.py
cwd: .
when I run that same command in terminal, I can see the output of black which seems correct:
PS C:\Users\Me\Path>~\.local\bin\black.exe --diff --quiet .\test.py
--- test.py 2022-02-23 00:07:09.493174 +0000
+++ test.py 2022-02-23 00:07:30.533624 +0000
@@ -1,2 +1,2 @@
-print( "test")
-print(3+8/3)
+print("test")
+print(3 + 8 / 3)
I am using the following settings:
{
"python.formatting.provider": "black",
"python.formatting.blackPath": "${env:USERPROFILE}\\.local\\bin\\black.exe",
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": true
},
"editor.defaultFormatter": "ms-python.python",
},
...
}
(black is installed through pipx)
Running ~\.local\bin\black.exe .\test.py
does format my file correctly.
Is it because of the --diff
arg vscode give to black, how to remove it?
This setup was working last month for sure but now something seems to have changed and cannot figure out… I am so confused
Thanks for the help
Diagnostic data
- Python version (& distribution if applicable, e.g. Anaconda): 3.8.10
- Type of virtual environment used (e.g. conda, venv, virtualenv, etc.): Global
- Value of the
python.languageServer
setting: Pylance
User Settings
defaultLS: {"defaultLSType":"Pylance"}
downloadLanguageServer: true
envFile: "<placeholder>"
venvPath: "<placeholder>"
venvFolders: "<placeholder>"
condaPath: "<placeholder>"
pipenvPath: "<placeholder>"
poetryPath: "<placeholder>"
languageServer: "Pylance"
linting
• enabled: true
• cwd: "<placeholder>"
• Flake8Args: "<placeholder>"
• flake8Enabled: true
• flake8Path: "<placeholder>"
• lintOnSave: true
• banditArgs: "<placeholder>"
• banditEnabled: true
• banditPath: "<placeholder>"
• mypyArgs: "<placeholder>"
• mypyEnabled: true
• mypyPath: "<placeholder>"
• pycodestyleArgs: "<placeholder>"
• pycodestyleEnabled: false
• pycodestylePath: "<placeholder>"
• prospectorArgs: "<placeholder>"
• prospectorEnabled: false
• prospectorPath: "<placeholder>"
• pydocstyleArgs: "<placeholder>"
• pydocstyleEnabled: false
• pydocstylePath: "<placeholder>"
• pylamaArgs: "<placeholder>"
• pylamaEnabled: false
• pylamaPath: "<placeholder>"
• pylintArgs: "<placeholder>"
• pylintPath: "<placeholder>"
sortImports
• args: "<placeholder>"
• path: "<placeholder>"
formatting
• autopep8Args: "<placeholder>"
• autopep8Path: "<placeholder>"
• provider: "black"
• blackArgs: "<placeholder>"
• blackPath: "<placeholder>"
• yapfArgs: "<placeholder>"
• yapfPath: "<placeholder>"
testing
• cwd: "<placeholder>"
• debugPort: 3000
• nosetestArgs: "<placeholder>"
• nosetestsEnabled: undefined
• nosetestPath: "<placeholder>"
• promptToConfigure: true
• pytestArgs: "<placeholder>"
• pytestEnabled: false
• pytestPath: "<placeholder>"
• unittestArgs: "<placeholder>"
• unittestEnabled: false
• autoTestDiscoverOnSaveEnabled: true
terminal
• activateEnvironment: true
• executeInFileDir: "<placeholder>"
• launchArgs: "<placeholder>"
experiments
• enabled: true
• optInto: []
• optOutFrom: []
insidersChannel: "off"
tensorBoard
• logDirectory: "<placeholder>"
Extension version: 2022.0.1814523869 VS Code version: Code 1.64.2 (f80445acd5a3dadef24aa209168452a3d97cc326, 2022-02-09T22:02:28.252Z) OS version: Windows_NT x64 10.0.19042 Restricted Mode: No
System Info
Item | Value |
---|---|
CPUs | Intel® Core™ i5-8365U CPU @ 1.60GHz (8 x 1896) |
GPU Status | 2d_canvas: enabled gpu_compositing: enabled multiple_raster_threads: enabled_on oop_rasterization: enabled opengl: enabled_on rasterization: enabled skia_renderer: enabled_on video_decode: enabled vulkan: disabled_off webgl: enabled webgl2: enabled |
Load (avg) | undefined |
Memory (System) | 15.77GB (5.22GB free) |
Process Argv | |
Screen Reader | no |
VM | 0% |
A/B Experiments
vsliv368:30146709
vsreu685:30147344
python383:30185418
vspor879:30202332
vspor708:30202333
vspor363:30204092
pythonvspyl392cf:30425750
pythontb:30283811
pythonvspyt551:30345470
pythonptprofiler:30281270
vshan820:30294714
vstes263:30335439
pythondataviewer:30285071
vscod805:30301674
pythonvspyt200:30340761
binariesv615:30325510
bridge0708:30335490
bridge0723:30353136
vsaa593:30376534
vsc1dst:30438360
pythonvs932:30410667
wslgetstartedc:30433508
vsclayoutctrc:30437038
vsrem710:30416614
dsvsc008:30440022
pythonvsnew555cf:30442237
vsbas813:30436447
vscscmwlcmt:30438805
helix:30440343
Issue Analytics
- State:
- Created 2 years ago
- Comments:15 (3 by maintainers)
Top GitHub Comments
I did some investigating, and the TL;DR is that this isn’t a bug in either VSCode or Black, and technically not even in pipx. Rather, this is a bug in a component of distlib, which is used to create the
black.exe
launcher.I’ll open an issue there in a bit, but I just wanted to share the whole story here, as it’s quite beautiful 🥲.
Update 2022-02-28: Issue opened.
Note: I’ll be using Python 3.10 in the analysis below, as that’s what I have, but the general ideas should apply to other versions as well.
Chapter 1: Leaving A Comment Doesn’t Make It So
If we look at the exit code of
black.exe
when running it from VSCode (e.g. using Procmon), we can see that it is-1073741819
, which is0xC0000005
, i.e.STATUS_ACCESS_VIOLATION
. This means that it probably crashes on some invalid memory access. Indeed, if we attach WinDbg to the VSCode extension host process, set.childdbg 1
, and continue execution, we’ll see that it crashes insidepython.exe
, inucrtbase!dup_nolock
. This process is spawned indirectly byblack.exe
, and is the actual interpreter that runs Black. The exception code is then propagated as the process exit code all the way toblack.exe
. We’ll get to the complete process chain later on.Lucky for us, the Python devs provide private debugging symbols, so we can recover the full call stack with line numbers:
The
init_sys_streams
function is responsible for initializing thesys.stdin/stdout/stderr
objects in Python. It does so by associating them with the underlying fds of C’sstdin
,stdout
, andstderr
FILE *
objects, which it obtains using thefileno
function.The
is_valid_fd
function checks that a given file descriptor is valid bydup
-ing it. If thedup
succeeds, then the fd must be valid. In our case, the offending fd is 2, i.e. stderr.Sidenote about fds: the Microsoft C Runtime “emulates” POSIX-like file descriptors so that functions like
read
andwrite
andfileno
anddup
still work as expected. Internally, there is a mapping between a file descriptor and aHANDLE
that implements it (thisHANDLE
can be obtained via a call to_get_osfhandle
). This mapping is what allows code to specify 0 in a call toread
and have it read from stdin. We’ll get to how this mapping is initialized and maintained in a bit.Back to the story. In order for
dup
to actually duplicate an fd, it has to duplicate the underlying handle. Indeed, if we look at the code, that’s exactly what it does. Unfortunately, in our case this fails withERROR_INVALID_HANDLE
. If we look at all of the process’ handles, we can also see that the handle for fd 2 (stderr) doesn’t exist, so no wonder that duplication fails.Furthermore, the CRT has a bug in the duplication code:
Narrator voice:
new_fh
was -1 there.If
DuplicateHandle
fails,new_fh
is set to -1, and so the line_osfile(new_fh) &= ~FOPEN
accesses the fd array out of bounds.Note: We’re talking about version 10.0.19041.0 of the UCRT. AFAICT this has been fixed in newer versions. However, the version you have depends on Windows Update, so you may be out of luck.
Chapter 2: Left Out Of The Inheritance
Alright, so this explains why Black doesn’t return anything, as it just crashes during the Python interpreter’s initialization. But how come fd 2 is associated with a nonexistent handle? To understand that, we need to look into how the standard I/O fds are initialized at process startup.
The initialization flow is described rather well in the code itself. For our analysis the important part is this: on startup, the CRT obtains the
STARTUPINFOW
structure used to create the process (viaGetStartupInfoW
), and looks at thelpReserved2
field in it. This field points to a map from fds to handles, which the CRT uses to initialize its internal fd table. The assumption here is thatlpReserved2
was initialized by the parent process, and that the handles within were inherited from it. This makes it possible to use functions such asspawn
andexec
and have the child process inherit fds, just as it would be on POSIX.Crucially, the CRT does try to validate each handle in the
lpReserved2
array by callingGetFileType
. If it returnsFILE_TYPE_UNKNOWN
, the CRT doesn’t use the handle. However, this check is skipped for pipes, so a garbage value can still sneak in (foreshadowing).Now, if we look at the handle table for
black.exe
, we’ll find there a handle with the same value that was assigned to fd 2 in the crashingpython.exe
. Indeed, we’ll also find there the handles assigned to fds 0 and 1 (stdin and stdout). Moreover, all three handles are specified in thehStdInput
,hStdOutput
, andhStdError
fields in theSTARTUPINFOW
structure used to startblack.exe
.Except, the stderr handle is not inheritable! Something in
black.exe
must have marked the handle as non-inheritable before spawning the child process, but didn’t inform the CRT of this. So the CRT passed the fd on to the child, but the handle was left behind in the parent.[^python-issue]Indeed, setting a breakpoint on
SetHandleInformation
inblack.exe
reveals that it marks the stderr handle as non-inheritable before callingCreateProcessW
.But what is
black.exe
? WinDbg shows its image name ast64.exe
, which is an important clue.t64.exe
is present in the pip package as part of distlib (insite-packages/pip/_vendor/distlib
), however the distlib repo doesn’t have the source fort64.exe
, only binaries. Further sleuthing reveals that the source is actually here. Read the description there for what the thing does, it’s quite nice.For our purposes, however, the only interesting part is this:
This does several things:
STARTUPINFOW
structure used to create the current process.STARTUPINFOW
.[^win-io-handles]STARTUPINFOW
.However, the original
STARTUPINFOW
has a CRT array of fds inlpReserved2
! And the handles therein are the ones the launcher just closed or marked as non-inheritable (which amounts to the same thing)! So the child gets garbage handle values.There’s one other thing of note here: in the code above, each I/O handle is duplicated and then closed. Due to how Windows reuses handle values, this means that the next created handle will probably get the same value as the one that was just closed. So after running the code above we’ll most likely have:[^handle-reuse]
si.hStdInput
== some valuesi.hStdOutput
==hIn
si.hStdError
==hOut
This means that when the CRT goes to initialize its fds in the child process, stdin and stdout will have valid handles, just that they’d refer to the wrong objects. And, as mentioned previously, stderr will have a garbage handle value.
Chapter 3: A Series of Unfortunate Events
At this point we have enough information to piece together the series of events leading to the crash:
black.exe
and passes 3 handles for its stdin, stdout, and stderr - both in thehStdInput
,hStdOutput
, andhStdError
fields and in thelpReserved2
map.black.exe
duplicates the Windows I/O handles fromhStdInput
,hStdOutput
, andhStdError
(this is basically whatGetStdHandle
retrieves) and closes the originals.black.exe
launches the child process with the duplicated handle values, but does not update thelpReserved2
map.lpReserved2
map, which at this point contains garbage. Since VSCode passes pipes in this array, the CRT does not validate the handles.sys.std*
streams with the fds 0-2. To do that, it validates each fd by trying todup
it:dup
for fd 0 succeeds, as it points to a valid handle (albeit the one for the standard output stream).dup
for fd 1 succeeds, as it points to a valid handle (albeit the one for the standard error stream).dup
for fd 2 fails theDuplicateHandle
call as the handle doesn’t exist, and promptly crashes when indexing an array out of bounds.We can now also explain why Black works when run from the command-line.
cmd.exe
simply doesn’t populatelpReserved2
when it callsCreateProcessW
, so the CRT doesn’t get confused when it initializes its fd table.And that’s about it. A story of undocumented structures, random chance, and ABIs you didn’t know you had to uphold.
Truly beautiful.
Appendix A: All My Sons
Actually, that’s not everything. There’s one other process in play when running Black. The full tree looks like this:
black.exe
python.exe
python.exe
black.exe
we have already met. The lastpython.exe
is the actual interpreter, where all the fun happens. Thepython.exe
in the middle is a special launcher that resides in the virtualenv that pipx creates for Black. Its job is to set up the environment before calling the actual Python interpreter, so that it executes inside the virtualenv.Why is this interesting? Because this launcher also performs the handle duplication dance. Except, in this case it doesn’t close the original handles. Since the originals were inherited from
black.exe
, they’ll be passed onto the finalpython.exe
.So the analysis above is still valid, we just end up with a couple more duplicate handles.
Appendix B: Well-Known Secrets
Recall how VSCode launches
black.exe
with bothlpReserved2
and the usualSTARTUPINFOW
fields filled with handles. In an effort to reproduce the bug without VSCode, I looked for a C function that launches a process and populates thelpReserved2
field.I found only one:
spawn
. Except this function sets onlylpReserved2
, but not the otherSTARTUPINFOW
fields.Upon further examination, it appears that VSCode uses libuv for launching the process. And look what I found there:
It would appear that the
lpReserved2
is not so reserved. It’s not exactly risky to manually populate this field, since it’s practically part of the CRT ABI now, but it’s still surprising (for me) to see such a thing in a well-known library.Appendix C: More Fun
Here’s a bit of C code that launches the command it receives in its arguments (just don’t put spaces in any of them):
This isn’t doing anything even remotely interesting, yet when you run
black.exe
with this wrapper:Debugging this is left as an exercise for the reader 😎.
[^python-issue]: Fun fact: Python had a similar issue with handle inheritance and
lpReserved2
in the past.[^win-io-handles]: Note that these are Windows I/O handles, specified in the
hStdInput
,hStdOutput
, andhStdError
fields ofSTARTUPINFOW
when creating a process. The CRT will use these to initialize fds 0-2 in the absence of anlpReserved2
map, or if the map doesn’t specify a handle for a particular fd in the range 0-2.[^handle-reuse]: Most likely. I don’t claim to understand how Windows reuses handle values, but the behaviour described here was what I consistently observed during debugging.
It turns out manually created venv under
pipx
location works. There might be some difference in howpipx
creates venvs and installs packages. @mbikovitsky 's investigation was great but it doesn’t explain OP’s testing result.UPDATE
Yes, there are differences. The
pipx
uses a sharedpip
installation to install packages. To prove that makes a difference:Normal venv with pip inside, works fine
Thin venv without pip, but reference to pip outside, buggy. This is how pipx works
What’s more,
fine\Scripts\black.exe
andbad\Scripts\black.exe
have different size.So the current workaround is to create a dedicated venv and install black in it, then point
blackPath
to the executable.