question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Transform `plant_in_srvce` xbrl + dbf

See original GitHub issue

The Plant in Service table is the only “row mapped” table that we’ve already pulled into PUDL. Even though it’s not the highest priority of this type of table, we want tackle it first so we can learn from it, and adapt the new transform process to accommodate it, since there are lots of other tables like this.

  • The DBF data is row-oriented, with each row number pertaining to a different FERC account number, or to subtotals and totals of various groups of related FERC accounts, and different columns representing starting & ending balances, additions, transfers, subtractions, etc.
  • The XBRL data is column-oriented, with different columns representing different FERC account numbers and the additions/retirements/transfers/etc. This results in more than 400 columns.
  • We’ve decided to go with the “tidy” or “long” format for these tables, with each column representing a different quantity, and the rows containing identifying information about that quantity.

For the plant_in_service table, this means we’ll end up with 6 columns, which happen to correspond to the structure that we find int he DBF tables, but with static IDs rather than annually varying row numbers for the different FERC Accounts. The columns will be:

  • starting_balance (XBRL instant)
  • additions (XBRL duration)
  • retirements (XBRL duration)
  • adjustments (XBRL duration)
  • transfers (XBRL duration)
  • ending_balance (XBRL instant)

XBRL Taxonomy Metadata

To effectively aggregate the values in the above columns, we need some additional metadata, available from the XBRL Taxonomy:

  • The groupings of FERC accounts are stable and applied uniformly in many contexts because they are important for filing taxes appropriately. We want to preserve as much of that structure as possible so that both the individual accounts, and their meaningful groupings can be analyzed.
  • We need to take care that the sign convention for different rows/columns are propagated and standardized. E.g. the retirements column is a credit while all the others are debit, but the convention flips in rows that represent sales of equipment rather than purchases.

Table Notes

pis_dbf = pd.read_sql("f1_plant_in_srvce", ferc1_engine)
pis_xbrl_duration = pd.read_sql("electric_plant_in_service_204_duration", ferc1_xbrl_engine)
pis_xbrl_instant = pd.read_sql("electric_plant_in_service_204_instant", ferc1_xbrl_engine)
  • DBF has start/end balance + add/retire/adjust/transfer, as rows w/ labels accessible in the f1_row_lit_tbl
  • XBRL data has legible column names but no account numbers (though they are available in the XBRL taxonomy)
  • XBRL Instant has one number for each account or grouping. Turns out these are “end of last year” and “end of this year” balances, which we can transform into starting_balance and ending_balance in the current year. However we have to do some reshaping of the instant table to make this work (turning 2 years of 1 group of columns into 1 year of 2 groups of columns).
  • XBRL Duration table has has columns w/ legible names but no FERC Acct numbers.
  • There are almost 500 XBRL columns: ~100 different variables, with 6 variables reported for each one.
  • DBF data has a mix of header, subheader, total, subtotal, FERC account and a few other numerical values.
  • XBRL seems to have clean naming, but names alone can’t be used to group the categories.
  • XBRL has FERC accounts in the metadata.
  • Seems like it makes sense to adopt the XBRL column names as the new labels for the old (and variable) DBF row numbers.

Tasks

A bespoke reshaping transformation has been implemented via #2025 but we need some additional metadata to enable all the aggregations, which @cmgosnell has communicated is the next priority for RMI.

  • Read XBRL taxonomy JSON into a dataframe, retaining the name, account, calculation, and balance columns.
  • Normalize the account column to contain a simple string value.
  • Figure out how to select just the relevant XBRL values for the plant_in_service table from the larger dataframe
  • Figure out how to / whether we can reshape the wide-format categories into a table of metadata that applies uniformly across the 6 columns we are retaining. yes we can
  • Implement renaming of instant & duration XBRL tables so they follow a programmatically usable naming convention.
  • Compile column sign conventions in a dictionary.
  • Fix dev notebook to work with all the renamed columns.
  • Fix the overwhelming warnings resulting from column_rename()
  • Fix overwhelming warnings from duplicat record_id in reshaped tables.
  • Fix bad XBRL multi-index construction that is scrambling all the reported values.
  • Pull draft metadata extraction functions into module & apply sign conventions in transform_main()
  • Split merge_metadata() and apply_sign_conventions() into two methods
  • Simplify / clarify calculation empty list mess.
  • Use just one name for xbrl_metadata_json.

Later tasks

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:16 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
jrea-rmicommented, Sep 12, 2022

I’ll be following this. I need to review the PUDL output version of this table, compare to the version we made for the Utility Transition Hub for combining with the balance sheet, and may have suggested edits.

0reactions
jrea-rmicommented, Dec 9, 2022

The aggreagations I see us doing for the plant in service table are to the technology level, with or without asset retirement costs.

I figure the aggregation by technology is in the XBRL taxonomy, and filtering out the asset retirement costs can be done based on listing the rows to exclude before aggregating.

Everything in the plant in service table is groupby.sum(), but agree that will not work in general for other tables that have minus signs in their calculated fields, without a label or aggregation function. I’m still curious to see a case where a field has a different sign for different aggregations, as far as I’ve seen each record has a single sign convention.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inline XBRL - SEC.gov
Inline XBRL is a structured data language that allows filers to prepare a single document that is both human-readable and machine-readable, so that...
Read more >
More on SEC's filing fee transformation as new rule takes effect
Crucially, “as filers and the legal community modify their table data to meet the new requirements, it paves the way for well-structured fee ......
Read more >
An Introduction to XBRL
Just as the standardised shipping container transformed global supply chains for goods, ... data and can be the key to transforming reporting supply...
Read more >
iXBRL Tagging Features | XBRL
This document focuses on the transformations and other iXBRL tagging features that need to be applied in particular reporting situations. It ...
Read more >
Tools and Services | XBRL
With our XBRL Converter solution you can easily and quickly convert your existing Lotus, Excel and / or Word documents to XBRL files...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found