Transform `f1_elctrc_oper_rev` xbrl + dbf
See original GitHub issueElectric Operating Revenues
Exploratory Notes
- The DBF
f1_elctrc_oper_rev
table is split into 2 XBRL tables:electric_operating_revenues_300
electric_operating_revenues_other_300
- Both of these tables contain only duration (annual), no instant facts.
electric_operating_revenues_300
has 2 distinct sections:- revenues from electricity sales, which reports revenues (USD), MWh sold, and # of customers.
- other revenues, which only reports only USD.
- Electricity sales portion of the table reports 3 kinds of quantities:
- revenues (USD, no clear naming convention)
- energy (MWh, prefix: “megawatt_hours_sold”, 12 XBRL columns)
- number of customers (prefix: “average_number_of_customers_per_month”, 12 XBRL columns)
- Both current and previous years of data are being reported.
- In XBRL this shows up as two different sets of starting/ending dates (start/end last year & start/end this year)
- In DBF these are reporting as separate columns.
- The “other revenues” section of the table enumerates some specific “other” FERC accounts, but also contains a “other miscellaneous operating revenues” section, which seems like it may be a reference to the
electric_operating_revenues_other_300
table. - The
electric_operating_revenues_other_300
table is a freeform mess, with a description column that seems like it can contain anything describing the revenue source (in practice it often contains a reference to a FERC account) and a revenue quantity. However, there are only a small number of these truly miscellaneous entries. - The bulk of the “other” revenues seem to have been enumerated along with the appropriate FERC accounts. The same set of miscellaneous categories are enumerated in the latter DBF and the XBRL data.
- The number of actual freeform miscellaneous revenue entries in the DBF table (in the rows without well defined labels) is also very small (only a few hundred entries across a few hundred thousand total records).
Transformation Plan / Questions
- This seems like another table where the reshaping is most of the work.
- Lack of stable naming conventions means we’ll need to define the instant/duration rename dictionaries at the same time as the mapping happens, and map the renamed column names + their appropriate suffixes.
- Will need to be careful to validate that the renamed columns that are mapped to DBF row numbers are consistent with the XBRL column renaming dictionaries that we store in the transform parameters for the table. Can be done in the Pydantic parameter transforms.
- Is it appropriate to keep the customer + energy numbers mixed in with revenues here, when they only apply to a subset of the revenue categories (sales of electricity to customers)? As it is, we would probably end up with FERC Account numbers associated with non USD values (energy + customers) which seems a little weird.
- Current + Previous year reporting is redundant, and there’s no start/end of year + deltas structure like there was in the Electric Plant In Service table, so it seems like we should only keep the current year of data. Will need to drop half the rows in XBRL, and half the columns in DBF. Should we do this preemptively, or bring them through the process as far as we can?
- To what extent should we try and clean up the miscellaneous revenue sources that aren’t explicitly enumerated?
- To what extent should we try and connect the old DBF data (which didn’t enumerate as many “other” categories) with the newer DBF + XBRL data that does?
- Given how few uncategorized other/misc entries there are, should we integrate that table, or just use the aggregated other / misc value that’s reported in the main
electric_operating_revenues_300
table? If we can set aside the details of the “other revenues” that aren’t neatly categorized, then processing this table should be relatively straightforward. Can align the old categories & totals with the new ones where they overlap, and just have the newly enumerated categories break the total down into smaller pieces when they do exist.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Updating the Transformation Rules that keep Inline XBRL data ...
Transformation Rules support the Inline XBRL specification, defining how values such as dates and numbers that are presented using local ...
Read more >Inline XBRL - SEC.gov
Inline XBRL is a structured data language that allows filers to prepare a single document that is both human-readable and machine-readable, so that...
Read more >How to Create Reports with XBRL - Business Central
XBRL is an XML-based language for tagging financial data, and enabling businesses to efficiently and accurately process and share their ...
Read more >It's Time to Transform
Transform ™ from CompSci Resources is your all-in-one SEC reporting software solution for EDGAR HTML authoring, XBRL tagging, XML form generation, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, we’d need stable identifiers for the row alignment, and for the table to have a well defined structure, but the
acct_dsc
value could be kept for reference in its own column.It’s also wild that there don’t seem to be any instructions about how to fill in these freeform entries in the PDF. They’re just left blank and people seem to be doing whatever they want in there. So of course it’s a mess.
I’m okay with the option of aggregating the unstructured data as a single value if that’s the easiest way to combine it with the main table to make it complete. It’s unfortunate that loses a little bit of detail, but can reference the unstructured table for one-off investigations.
The numbered pseudo-categories would cover row 25 in recent years and rows 22-25 in earlier years, giving that row a single descriptor rather than the “acct_dsc” that the respondent filled in?
I can’t tell what row the other_miscellaneous_operating_revenues entry in the “main” table corresponds to in the .pdf version. I agree it could have been the total from the “other” table, but it seems to be something else, and the “other” table is not duplicated in the “main” table at all - but crazy that you need those values to add to the rest of the “other operating revenues” to make the total match.