Get-ChildItem's Influence on File Listing Order
See original GitHub issueBased on my own observations, it would appear that in PowerShell 5.1 the order in which Get-ChildItem
lists files is governed by the file system, which is in accordance with this article by Raymond Chen where he explains how the command prompt’s DIR command lists files when no explicit sorting is specified.
Whereas in PowerShell Core, Get-ChildItem
appears to list files using case-insensitive lexicographical sorting irrespective of the underlying file system.
My questions are:
- Can someone point to the code that’s responsible for producing case-insensitive lexicographical sorting? I’m not sure if this behaviour is due to coding within PowerShell or if it’s just the result of behaviour exhibited by something within .Net core.
- Is this behaviour now considered contractual or is it liable to change? If contractual then should it be documented?
- There are certainly benefits to having a standardised file listing order however the case-insensitive aspect of it is somewhat contrary to expectation on GNU/Linux. Should
Get-ChildItem
provide a means of respecting case on GNU/Linux?
Below are details of my observations…
NTFS File System
Excerpt from Raymond Chen’s article: “The NTFS file system internally maintains directory entries in a B-tree structure, which means that the most convenient way of enumerating the directory contents is in B-tree order…”
Having done some further research, the B-tree that Raymond alludes to is organised lexicographically.
Test Performed
# Create five files that adhere to the format `foo_<YYYYMMDD>.txt`, one of which
# is uppercase.
New-Item -Path "C:\TestFiles" -ItemType Directory
"Hello" > "C:\TestFiles\foo_20210304.txt"
"Hello" > "C:\TestFiles\foo_20210301.txt"
"Hello" > "C:\TestFiles\FOO_20210303.txt"
"Hello" > "C:\TestFiles\foo_20210302.txt"
"Hello" > "C:\TestFiles\foo_20210305.txt"
Result: Windows PowerShell 5.1 on Windows 10
Files are listed in case-insensitive lexicographical order.
PS C:\> Get-ChildItem "C:\TestFiles"
Directory: C:\TestFiles
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 25/04/2021 04:02 16 foo_20210301.txt
-a---- 25/04/2021 04:02 16 foo_20210302.txt
-a---- 25/04/2021 04:02 16 FOO_20210303.txt
-a---- 25/04/2021 04:02 16 foo_20210304.txt
-a---- 25/04/2021 04:02 16 foo_20210305.txt
Result: PowerShell Core 7.1.3 on Windows 10
Files are listed in case-insensitive lexicographical order.
PS C:\> Get-ChildItem "C:\TestFiles"
Directory: C:\TestFiles
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 25/04/2021 04:07 7 foo_20210301.txt
-a--- 25/04/2021 04:07 7 foo_20210302.txt
-a--- 25/04/2021 04:07 7 FOO_20210303.txt
-a--- 25/04/2021 04:07 7 foo_20210304.txt
-a--- 25/04/2021 04:07 7 foo_20210305.txt
FAT32 File System
Excerpt from Raymond Chen’s article: “If the storage medium is a FAT-formatted USB thumb drive, then the files will be enumerated in a complex order based on the order in which files are created and deleted and the lengths of their names.”
Test Performed
I connected a FAT32 formatted USB thumb drive to a computer running Windows 10 where it was mounted as D:\
drive.
# Create five files that adhere to the format `foo_<YYYYMMDD>.txt`, one of which
# is uppercase.
New-Item -Path "D:\TestFiles" -ItemType Directory
"Hello" > "D:\TestFiles\foo_20210304.txt"
"Hello" > "D:\TestFiles\foo_20210301.txt"
"Hello" > "D:\TestFiles\FOO_20210303.txt"
"Hello" > "D:\TestFiles\foo_20210302.txt"
"Hello" > "D:\TestFiles\foo_20210305.txt"
Result: Windows PowerShell 5.1 on Windows 10
Files are listed in the order they were created.
PS C:\> Get-ChildItem "D:\TestFiles"
Directory: D:\TestFiles
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 25/04/2021 04:11 16 foo_20210304.txt
-a---- 25/04/2021 04:11 16 foo_20210301.txt
-a---- 25/04/2021 04:11 16 FOO_20210303.txt
-a---- 25/04/2021 04:11 16 foo_20210302.txt
-a---- 25/04/2021 04:11 16 foo_20210305.txt
Result: PowerShell Core 7.1.3 on Windows 10
Files are listed in case-insensitive lexicographical order.
PS C:\> Get-ChildItem "D:\TestFiles"
Directory: D:\TestFiles
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 25/04/2021 04:12 7 foo_20210301.txt
-a--- 25/04/2021 04:12 7 foo_20210302.txt
-a--- 25/04/2021 04:12 7 FOO_20210303.txt
-a--- 25/04/2021 04:12 7 foo_20210304.txt
-a--- 25/04/2021 04:12 7 foo_20210305.txt
Ext4 File System
Test Performed
# Create five files that adhere to the format `foo_<YYYYMMDD>.txt`, one of which
# is uppercase.
New-Item -Path "~/TestFiles" -ItemType Directory
"Hello" > "~/TestFiles/foo_20210304.txt"
"Hello" > "~/TestFiles/foo_20210301.txt"
"Hello" > "~/TestFiles/FOO_20210303.txt"
"Hello" > "~/TestFiles/foo_20210302.txt"
"Hello" > "~/TestFiles/foo_20210305.txt"
Result: PowerShell Core 7.1.3 on Ubuntu 20.04
Files are listed in case-insensitive lexicographical order (contrary to what you’d expect on GNU/Linux).
PS /> Get-ChildItem "~/TestFiles"
Directory: /home/thecliguy/TestFiles
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 04/25/2021 04:27 6 foo_20210301.txt
----- 04/25/2021 04:27 6 foo_20210302.txt
----- 04/25/2021 04:27 6 FOO_20210303.txt
----- 04/25/2021 04:27 6 foo_20210304.txt
----- 04/25/2021 04:27 6 foo_20210305.txt
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
This is a dangerous trend. You can’t make scripts and the engine slow for the sake of cosmetic changes. Such things only make sense where the user sees them with his own eyes. Otherwise it should be explicitly spelled out in the script if it needs to be sorted there. We already have this problem with case normalization. As a result, we have up to 10 times slower file operations. All cosmetic goodies should be in the formatting system and yes it should be switchable.
We must follow the general rule that all output for users in PowerShell uses Culture.
If we say about case we must follow the general rule too - PowerShell is case-insensetive. Deviation from this rule generates an infinite number of bugs, of which there are already a lot in this subsystem. I’m not saying that we should ignore the properties of file systems and work with EXT4 just like NTFS, but we should not break the general principles on which PowerShell works. If these design fundamentals do not meet the user’s expectations, he can use other tools. But we must not allow the reverse situation where a user is forced to give up PowerShell because we ourselves are destroying its principles. So my script runs for about 5 hours on Windows PowerShell. I’ve never been able to find out how long it runs on PowerShell Core since I can’t wait a few days. I would like to use it, but I can’t. I would like to make it the fastest, but … so far everything is being done to make it the slowest, and we’ve succeeded.
I can’t speak to the contractual aspect, but here’s the code that explicitly sorts:
https://github.com/PowerShell/PowerShell/blob/c857392c34613fc8ece1432ceb89a5310d9e5fd9/src/System.Management.Automation/namespaces/FileSystemProvider.cs#L1617-L1618