Addressing the Format challenges in PowerShell
See original GitHub issueSummary of the new feature/enhancement
Since version 1, the implementation of the formatting layer in PowerShell has included many challenging limitations, some of which include the following:
Format-*
cmdlets output custom format data objects that are then rendered in the console once they are passed intoOut-Default
for processing. e.g.Get-Process -Id $PID | Format-Table | Get-Member
.- Since these cmdlets convert PSObjects into format data objects, you cannot pipe their results to other commands to do something meaningful. e.g.
Get-Process -Id $PID | Format-Table | Stop-Process -WhatIf
- Other than
Format-List
,Format-*
is not respected when you output heterogeneous types that are not compatible with one another. e.g.& {Get-Process -Id $PID; Get-Item C:\} | Format-Table
. - To create a command that produces objects that render in a specific format by default, you must define the format using format ps1xml files, and you must ensure that the PSTypeNames of objects returned from your command have a PSTypeName that matches the specific format you want. This would require a much more complicated example involving ps1xml format files, so I have left that out for now.
I have attempted to eliminate many of these limitations in PowerShell 5.1 and earlier with my [FormatPx module](https://github.com/KirkMunro/FormatPx], and while that worked out quite well, I also discovered several issues with the FormatPx approach, namely:
- Producing output formatted in a specific way from a command is easy with FormatPx, but capturing that output in such a way that it maintains the desired format when that output data may also be captured as reference and output via other means (e.g.
$Error
for errors) where you don’t want the same format is a difficult problem to solve without some redesign. - Rendering formatted output when formatted output is not needed (e.g. when you are capturing output or piping a command to another command) is a processing expense that must be avoided until it is needed. Otherwise you may be spending CPU cycles formatting data, only to then have that format replaced with other formats.
- When outputting some data with truly out of band data (e.g. errors, warnings, etc.) in the middle of the output, it is desirable to show that out of band data and then continue the output that you previously started rather than close off the output and then start output anew with headers, etc. being shown again.
Now I have some users who would like to use FormatPx
in PowerShell (Core) on Linux, and even though that only requires some relatively minor changes, there are issues I would like to deal with and I believe it would be much more valuable for the community if I take the knowledge gained through my FormatPx
work and apply it natively to PowerShell 7, since it is open source after all.
Proposed technical implementation details
The specific problems I would like to solve in PowerShell 7 are as follows:
-
Make it much easier for scripters to select the format that they want for data output from any command that they author.
By “much easier”, I mean without having to unnecessarily muck around with the PSTypeNames array on non-custom objects that are output, and being able to easily select existing, named format views or identify new, command-specific formats that should be used as the default format for the objects output by the command, all while still just outputting objects that can be captured, output, used in expressions, or piped to other commands.
-
Fix output processing such that heterogeneous types can be output to
Format-Table
,Format-Wide
, orFormat-Custom
(but that’s hardly used) with PowerShell rendering that output in tabular, wide, or custom formats, respectively, without implicitly treating objects with different types as out of band objects.This differs from today’s behavior, in which the Formatting engine identifies the current “shape” to use when rendering formatted data, and any object whose type does not match the type(s) used in formatting data in the current shape are marked as out of band and shown as if they were output with Format-List if they are non-value, non-string types or shown as their string representation if they are value types or strings.
This is a breaking change, because commands that output multiple heterogeneous types would render differently if they are piped to
Format-Table
,Format-Wide
, orFormat-Custom
, and scripts that capture that formatted output and do something with it may have issues; however, it is not very likely that there are many scripts out there that produce heterogeneous data types with different formats that then have their data captured in string format and processed. For this reason, I believe it is worth the break, because it corrects a long-standing issue in PowerShell and makes PowerShellFormat-*
cmdlets function in a WYSIWYG fashion, as they should.
To solve these problems, I would like to make the following changes:
-
Extend the
OutputType
attribute such that it includes a newFormat
property.For example, consider this command:
function Get-ProcessByStartTime { [CmdletBinding()] [OutputType([System.Diagnostics.Process], Format='Table', FormatParameters=@{ View = 'StartTime' })] Get-Process | Where-Object StartTime | Sort-Object StartTime }
While that is a contrived example, it demonstrates how a command could be written to apply a specific, named view to the objects that it outputs.
Now consider this example, which would likely be a much more common approach to specifying custom formatting to use:
function Get-Something { [CmdletBinding()] [OutputType([System.ServiceProcess.ServiceController], Format='Table', FormatParameters=@{ Property = 'Name','DisplayName' GroupBy = 'Status' })] Get-Service | Sort-Object -Property Status,Name }
That example shows how you can define a default format for a command without having to create a custom format data entry: you simply specify the desired default format using the
OutputType
attribute.Here’s one more example to show how it could work with custom objects:
function Get-OperatingSystem { [CmdletBinding()] [OutputType('MyModule.OperatingSystem', Format='List')] $osData = Get-CimInstance -Class Win32_OperatingSystem [pscustomobject]@{ PSTypeName = 'MyModule.OperatingSystem' Name = $osData.Caption Version = $osData.Version Architecture = $osData.OSArchitecture } }
In all cases, what I envision would happen behind the scenes is that the returned objects would be wrapped in a lightweight
PSFormatObject
type that is derived fromPSObject
. This type would capture the desired format as specified in theOutputType
attribute, but it wouldn’t actually render the data in that format – rendering would be deferred untilOut-Default
, allowing the command output to be piped to other commands like normal and saving the time required to render the data until it is needed. If you pipe the output from these commands to anyFormat-*
command, the data would immediately be rendered in the desired format. If you show the data output from these commands without using aFormat-*
command, the data would render using the format information that was attached to thePSFormatObject
objects.Note that commands that return multiple object types could define the desired format for each object type.
-
Fix the heterogeneous output type problem.
To solve this problem, the formatting engine would be updated such that a new “shape” doesn’t result in out of band processing. Instead, if the object was not a scalar nor a string (and perhaps other simple types – I would need to double-check the code to ensure I’m covering the proper types), and if the object was output on the standard output stream, it would result in the closing of the current group/shape, and a new format would be started with the changed, incompatible, heterogeneous type. Non-standard output data as well as scalars and strings would still result in out of band data rendering.
Alternate proposals and considerations
Alternate proposal: Change how Format-*
cmdlets work, such that formatting is always deferred until Out-Default
Instead of extending OutputType
, we could update Format-*
cmdlets to do the same thing (create a PSFormatObject
that contains a shared format object reference for each object processed). This would allow piping beyond Format-*
, and Format-*
cmdlets would be used to define the format you want when the data is output rather than to convert the data to the desired format output.
On the plus side, this would allow scripters to just use Format-*
where they want, even without dealing with custom type names.
There are several downsides to just changing Format-*
behavior, including the following:
- It would result in functions that return format data in downlevel versions, and object data in current + future versions, which would be confusing.
- It would require users to use the same
Format-*
invocation anywhere that objects are returned from a function or script, which is more difficult than defining the returned format as part ofOutputType
. - It is less declarative than an
OutputType
format, and less discoverable (OutputType
has the added benefit of being easily parsed and discovered via programmatic inspection).
For these reasons, I would stay away from this approach, and stick with the proposal of extending OutputType
.
Alternate proposal: use something like @lzybkr’s PSMore module
@lzybkr started a PSMore module a few years ago to address some of PowerShell’s formatting issues. That module is a side project that has not seen much movement.
Regardless of how that module proceeds going forward, I believe the proposal documented herein to be worth the investment because it is additive, and it solves some significant formatting issues without changing how formatting is defined in configuration and without changing how formatting is rendered in PowerShell today. I also don’t believe that these changes get in the way of what could be done in the PSMore module going forward.
Consideration: Define a Format-Default
cmdlet
With support for more easily customizable formatting without breaking the pipeline-ability of commands, I would also advocate for defining a Format-Default
cmdlet. This cmdlet would simply replace any PSFormatObject
objects that were passed into it with their PSObject
counterparts that do not have format data associated with them, allowing objects to have custom formatting stripped from them.
Consideration: Deferring output may result in named formats being unavailable if they are defined in modules that are unloaded
Consider this scenario:
- You have a module with a command that returns data configured to use a specific named format.
- The named format is loaded as part of the same module.
- You invoke that command and capture the results in a variable.
- You then unload that module.
- You then pass your captured data to
Out-Default
.
By the time the data reaches Out-Default
, the desired format that was specified in the command will not be available in memory anymore because it will have unloaded with the module.
In this case, the data would fallback to the default output for that data, as if it did not have associated PSFormatObject
information to work with.
Consideration: Use a new System.Management.Automation.OutputAttribute
instead of extending OutputType
The name OutputType
indicates that the attribute is used to define the output type. Instead of extending that to include formatting information as well, it may be better to define a new Output
attribute (System.Management.Automation.OutputAttribute
) that supercedes OutputType
, and use that attribute to define the type of objects that are output, the format used to render the objects that are output, which parameter sets return those type/format details, and leave the door open for additional future properties that could be added to the Output
attribute (e.g. Contract=$true
to indicate that output is contractual in a function, which would result in errors if an object that does not match an output type was returned from that function).
Creating a new attribute in place of extending the OutputType
attribute has two benefits:
- Downlevel users will not be confused when they see
OutputType
used with different syntax and get errors. Instead they will see a new attribute that they haven’t used before. They’ll still get errors, but at least those errors won’t be related to an attribute that is supported/documented in the version they are using. - Since this makes output attribution about much more than a type, having a new attribute for that with a different name may bring more clarity to what is being done. The old attribute would still be supported, so backwards compatibility would persist, but the Parser should probably raise an error if both attributes are used in the same command (see additional comments about Parser improvements when it comes to attributes below).
Consideration: New attributes and attribute extensions would result in runtime errors in downlevel versions
It is worth pointing out that it doesn’t matter if you extend an attribute with additional properties or if you define a new attribute – both result in runtime errors, not parse errors. I’m not surprised for the latter since attributes can be defined in modules, but I’m a little surprised for the former. I feel it would be better if attribute extensions via additional properties resulted in parse errors, but there may be a valid reason that they don’t that I am not aware of. At any rate, I call this out as a consideration because scripts written with attribute changes will result in runtime errors, not parse errors, in downlevel versions of PowerShell.
I think it is also worth updating the PowerShell parser to return parse errors if attributes are used that do not exist in a version of PowerShell, or if properties within those attributes that are used do not exist in a version of PowerShell. That is something that can be determined at parse time, and having a parser error would help guide users towards finding a version of PowerShell where those attributes and their properties are defined/supported. This work should be done as a separate PR (and a separate issue will be logged to track this need if we have consensus on this point).
Issue Analytics
- State:
- Created 4 years ago
- Reactions:8
- Comments:6 (3 by maintainers)
I think PSMore is a longer term project that will take much longer to bring into use. Of course a complete revamp of the format system is desirable; however, I prefer to solve this problem now than make the community wait for a new formatting system to be implemented.
The changes proposed here are easier to implement now without mucking around with the complexities of the XML format system – the format definitions themselves, and how they are rendered into the text we see in the console all stays the way it is now. These changes would be implemented natively in PowerShell from the start, and I don’t think they get in the way of what PSMore is trying to accomplish, especially if the way they work is kept internal.
As for downlevel support, If necessary, I believe, but would need to confirm, the changes proposed here could be supported downlevel via an update to FormatPx. I strongly prefer not having FormatPx be the solution for current+future PowerShell though, because it needs to proxy certain core cmdlets for it to work, which means it can only be imported with
-Force
, and current solutions shouldn’t have to take the extra dependency when this solution can just be implemented in-place in PowerShell today.I know, it’s not about that. This is about the need to separate formatted data from stored data. It’s something I learned while building FormatPx.
For example, if you wrote a function or a command to return information in errors that are stored in
$error
, with that information formatted a certain way, any ETS member you add will be on the error objects that are output as well as the objects that are stored in$error
(unless you explicitly copy those objects). This is undesirable – you wouldn’t want some errors stored in$error
to format one way, and others to format another. That’s what I want to avoid by having aPSFormatObject
type. Returning a lightweight formatting wrapper allows me to preserve the existing members on an object while still being able to identify that it should be formatted a certain way when it is output.