question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Addressing the Format challenges in PowerShell

See original GitHub issue

Summary of the new feature/enhancement

Since version 1, the implementation of the formatting layer in PowerShell has included many challenging limitations, some of which include the following:

  • Format-* cmdlets output custom format data objects that are then rendered in the console once they are passed into Out-Default for processing. e.g. Get-Process -Id $PID | Format-Table | Get-Member.
  • Since these cmdlets convert PSObjects into format data objects, you cannot pipe their results to other commands to do something meaningful. e.g. Get-Process -Id $PID | Format-Table | Stop-Process -WhatIf
  • Other than Format-List, Format-* is not respected when you output heterogeneous types that are not compatible with one another. e.g. & {Get-Process -Id $PID; Get-Item C:\} | Format-Table.
  • To create a command that produces objects that render in a specific format by default, you must define the format using format ps1xml files, and you must ensure that the PSTypeNames of objects returned from your command have a PSTypeName that matches the specific format you want. This would require a much more complicated example involving ps1xml format files, so I have left that out for now.

I have attempted to eliminate many of these limitations in PowerShell 5.1 and earlier with my [FormatPx module](https://github.com/KirkMunro/FormatPx], and while that worked out quite well, I also discovered several issues with the FormatPx approach, namely:

  • Producing output formatted in a specific way from a command is easy with FormatPx, but capturing that output in such a way that it maintains the desired format when that output data may also be captured as reference and output via other means (e.g. $Error for errors) where you don’t want the same format is a difficult problem to solve without some redesign.
  • Rendering formatted output when formatted output is not needed (e.g. when you are capturing output or piping a command to another command) is a processing expense that must be avoided until it is needed. Otherwise you may be spending CPU cycles formatting data, only to then have that format replaced with other formats.
  • When outputting some data with truly out of band data (e.g. errors, warnings, etc.) in the middle of the output, it is desirable to show that out of band data and then continue the output that you previously started rather than close off the output and then start output anew with headers, etc. being shown again.

Now I have some users who would like to use FormatPx in PowerShell (Core) on Linux, and even though that only requires some relatively minor changes, there are issues I would like to deal with and I believe it would be much more valuable for the community if I take the knowledge gained through my FormatPx work and apply it natively to PowerShell 7, since it is open source after all.

Proposed technical implementation details

The specific problems I would like to solve in PowerShell 7 are as follows:

  1. Make it much easier for scripters to select the format that they want for data output from any command that they author.

    By “much easier”, I mean without having to unnecessarily muck around with the PSTypeNames array on non-custom objects that are output, and being able to easily select existing, named format views or identify new, command-specific formats that should be used as the default format for the objects output by the command, all while still just outputting objects that can be captured, output, used in expressions, or piped to other commands.

  2. Fix output processing such that heterogeneous types can be output to Format-Table, Format-Wide, or Format-Custom (but that’s hardly used) with PowerShell rendering that output in tabular, wide, or custom formats, respectively, without implicitly treating objects with different types as out of band objects.

    This differs from today’s behavior, in which the Formatting engine identifies the current “shape” to use when rendering formatted data, and any object whose type does not match the type(s) used in formatting data in the current shape are marked as out of band and shown as if they were output with Format-List if they are non-value, non-string types or shown as their string representation if they are value types or strings.

    This is a breaking change, because commands that output multiple heterogeneous types would render differently if they are piped to Format-Table, Format-Wide, or Format-Custom, and scripts that capture that formatted output and do something with it may have issues; however, it is not very likely that there are many scripts out there that produce heterogeneous data types with different formats that then have their data captured in string format and processed. For this reason, I believe it is worth the break, because it corrects a long-standing issue in PowerShell and makes PowerShell Format-* cmdlets function in a WYSIWYG fashion, as they should.

To solve these problems, I would like to make the following changes:

  1. Extend the OutputType attribute such that it includes a new Format property.

    For example, consider this command:

    function Get-ProcessByStartTime {
        [CmdletBinding()]
        [OutputType([System.Diagnostics.Process], Format='Table', FormatParameters=@{
            View = 'StartTime'
        })]
        Get-Process | Where-Object StartTime | Sort-Object StartTime
    }
    

    While that is a contrived example, it demonstrates how a command could be written to apply a specific, named view to the objects that it outputs.

    Now consider this example, which would likely be a much more common approach to specifying custom formatting to use:

    function Get-Something {
        [CmdletBinding()]
        [OutputType([System.ServiceProcess.ServiceController], Format='Table', FormatParameters=@{
            Property = 'Name','DisplayName'
            GroupBy = 'Status'
        })]
        Get-Service | Sort-Object -Property Status,Name
    }
    

    That example shows how you can define a default format for a command without having to create a custom format data entry: you simply specify the desired default format using the OutputType attribute.

    Here’s one more example to show how it could work with custom objects:

    function Get-OperatingSystem {
        [CmdletBinding()]
        [OutputType('MyModule.OperatingSystem', Format='List')]
        $osData = Get-CimInstance -Class Win32_OperatingSystem
        [pscustomobject]@{
            PSTypeName = 'MyModule.OperatingSystem'
            Name = $osData.Caption
            Version = $osData.Version
            Architecture = $osData.OSArchitecture
        }
    }
    

    In all cases, what I envision would happen behind the scenes is that the returned objects would be wrapped in a lightweight PSFormatObject type that is derived from PSObject. This type would capture the desired format as specified in the OutputType attribute, but it wouldn’t actually render the data in that format – rendering would be deferred until Out-Default, allowing the command output to be piped to other commands like normal and saving the time required to render the data until it is needed. If you pipe the output from these commands to any Format-* command, the data would immediately be rendered in the desired format. If you show the data output from these commands without using a Format-* command, the data would render using the format information that was attached to the PSFormatObject objects.

    Note that commands that return multiple object types could define the desired format for each object type.

  2. Fix the heterogeneous output type problem.

    To solve this problem, the formatting engine would be updated such that a new “shape” doesn’t result in out of band processing. Instead, if the object was not a scalar nor a string (and perhaps other simple types – I would need to double-check the code to ensure I’m covering the proper types), and if the object was output on the standard output stream, it would result in the closing of the current group/shape, and a new format would be started with the changed, incompatible, heterogeneous type. Non-standard output data as well as scalars and strings would still result in out of band data rendering.

Alternate proposals and considerations

Alternate proposal: Change how Format-* cmdlets work, such that formatting is always deferred until Out-Default

Instead of extending OutputType, we could update Format-* cmdlets to do the same thing (create a PSFormatObject that contains a shared format object reference for each object processed). This would allow piping beyond Format-*, and Format-* cmdlets would be used to define the format you want when the data is output rather than to convert the data to the desired format output.

On the plus side, this would allow scripters to just use Format-* where they want, even without dealing with custom type names.

There are several downsides to just changing Format-* behavior, including the following:

  • It would result in functions that return format data in downlevel versions, and object data in current + future versions, which would be confusing.
  • It would require users to use the same Format-* invocation anywhere that objects are returned from a function or script, which is more difficult than defining the returned format as part of OutputType.
  • It is less declarative than an OutputType format, and less discoverable (OutputType has the added benefit of being easily parsed and discovered via programmatic inspection).

For these reasons, I would stay away from this approach, and stick with the proposal of extending OutputType.

Alternate proposal: use something like @lzybkr’s PSMore module

@lzybkr started a PSMore module a few years ago to address some of PowerShell’s formatting issues. That module is a side project that has not seen much movement.

Regardless of how that module proceeds going forward, I believe the proposal documented herein to be worth the investment because it is additive, and it solves some significant formatting issues without changing how formatting is defined in configuration and without changing how formatting is rendered in PowerShell today. I also don’t believe that these changes get in the way of what could be done in the PSMore module going forward.

Consideration: Define a Format-Default cmdlet

With support for more easily customizable formatting without breaking the pipeline-ability of commands, I would also advocate for defining a Format-Default cmdlet. This cmdlet would simply replace any PSFormatObject objects that were passed into it with their PSObject counterparts that do not have format data associated with them, allowing objects to have custom formatting stripped from them.

Consideration: Deferring output may result in named formats being unavailable if they are defined in modules that are unloaded

Consider this scenario:

  1. You have a module with a command that returns data configured to use a specific named format.
  2. The named format is loaded as part of the same module.
  3. You invoke that command and capture the results in a variable.
  4. You then unload that module.
  5. You then pass your captured data to Out-Default.

By the time the data reaches Out-Default, the desired format that was specified in the command will not be available in memory anymore because it will have unloaded with the module.

In this case, the data would fallback to the default output for that data, as if it did not have associated PSFormatObject information to work with.

Consideration: Use a new System.Management.Automation.OutputAttribute instead of extending OutputType

The name OutputType indicates that the attribute is used to define the output type. Instead of extending that to include formatting information as well, it may be better to define a new Output attribute (System.Management.Automation.OutputAttribute) that supercedes OutputType, and use that attribute to define the type of objects that are output, the format used to render the objects that are output, which parameter sets return those type/format details, and leave the door open for additional future properties that could be added to the Output attribute (e.g. Contract=$true to indicate that output is contractual in a function, which would result in errors if an object that does not match an output type was returned from that function).

Creating a new attribute in place of extending the OutputType attribute has two benefits:

  1. Downlevel users will not be confused when they see OutputType used with different syntax and get errors. Instead they will see a new attribute that they haven’t used before. They’ll still get errors, but at least those errors won’t be related to an attribute that is supported/documented in the version they are using.
  2. Since this makes output attribution about much more than a type, having a new attribute for that with a different name may bring more clarity to what is being done. The old attribute would still be supported, so backwards compatibility would persist, but the Parser should probably raise an error if both attributes are used in the same command (see additional comments about Parser improvements when it comes to attributes below).

Consideration: New attributes and attribute extensions would result in runtime errors in downlevel versions

It is worth pointing out that it doesn’t matter if you extend an attribute with additional properties or if you define a new attribute – both result in runtime errors, not parse errors. I’m not surprised for the latter since attributes can be defined in modules, but I’m a little surprised for the former. I feel it would be better if attribute extensions via additional properties resulted in parse errors, but there may be a valid reason that they don’t that I am not aware of. At any rate, I call this out as a consideration because scripts written with attribute changes will result in runtime errors, not parse errors, in downlevel versions of PowerShell.

I think it is also worth updating the PowerShell parser to return parse errors if attributes are used that do not exist in a version of PowerShell, or if properties within those attributes that are used do not exist in a version of PowerShell. That is something that can be determined at parse time, and having a parser error would help guide users towards finding a version of PowerShell where those attributes and their properties are defined/supported. This work should be done as a separate PR (and a separate issue will be logged to track this need if we have consensus on this point).

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:8
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
KirkMunrocommented, Sep 3, 2019

I think PSMore is a longer term project that will take much longer to bring into use. Of course a complete revamp of the format system is desirable; however, I prefer to solve this problem now than make the community wait for a new formatting system to be implemented.

The changes proposed here are easier to implement now without mucking around with the complexities of the XML format system – the format definitions themselves, and how they are rendered into the text we see in the console all stays the way it is now. These changes would be implemented natively in PowerShell from the start, and I don’t think they get in the way of what PSMore is trying to accomplish, especially if the way they work is kept internal.

As for downlevel support, If necessary, I believe, but would need to confirm, the changes proposed here could be supported downlevel via an update to FormatPx. I strongly prefer not having FormatPx be the solution for current+future PowerShell though, because it needs to proxy certain core cmdlets for it to work, which means it can only be imported with -Force, and current solutions shouldn’t have to take the extra dependency when this solution can just be implemented in-place in PowerShell today.

0reactions
KirkMunrocommented, Sep 4, 2019

I know, it’s not about that. This is about the need to separate formatted data from stored data. It’s something I learned while building FormatPx.

For example, if you wrote a function or a command to return information in errors that are stored in $error, with that information formatted a certain way, any ETS member you add will be on the error objects that are output as well as the objects that are stored in $error (unless you explicitly copy those objects). This is undesirable – you wouldn’t want some errors stored in $error to format one way, and others to format another. That’s what I want to avoid by having a PSFormatObject type. Returning a lightweight formatting wrapper allows me to preserve the existing members on an object while still being able to identify that it should be formatted a certain way when it is output.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Problems with PowerShell formatting - Michael Firsov
The format cmdlets, such as Format-List, arrange the data to be displayed but do not display it. The data is displayed by the...
Read more >
Understanding PowerShell and Basic String Formatting
Summary: Microsoft Scripting Guy, Ed Wilson, talks about understanding Windows PowerShell and basic string formatting. Hey, Scripting Guy!
Read more >
Use PowerShell and Conditional Formatting to ...
Summary: Microsoft Scripting Guy, Ed Wilson, shows how to use Windows PowerShell and conditional formatting to format numbers.
Read more >
PowerShell Puzzles and Challenges
My PowerShell puzzles and challenges for the Research Triangle PowerShell User Group but you are welcome to try your hand as well.
Read more >
Chapter 1. Solving administrative challenges - PowerShell ...
This chapter covers. The administrator's headache; Solving the challenge with automation; PowerShell and WMI—the automation tools. Ask any Windows administrator ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found