Treatment of missing values during CEMS transformation
See original GitHub issueThe EPA cems transformation process currently replaces missing values for gross_load_mw
and heat_content_mmbtu
with the following code:
raw_df.fillna({
“gross_load_mw”: 0.0,
“heat_content_mmbtu”: 0.0
})
I was wondering if there is a strong reason why these missing values are being filled?
As I’ve been exploring this data further, I have been finding that a null value and a zero value are not interchangable. For example, during startup, heat input and operating time can both be nonzero, but gross load can be zero.
I’d like to be able to identify some methods to fill missing values in CEMS using estimated values, but the current datapackage makes it difficult to identify which values are missing and which are actually reported as zero, since all the missing values have been filled.
I would propose removing the above fillna code in the pudl/src/pudl/transform/epacems.py file.
I don’t think that keeping these values as NA should break any other code further down the line, as the co2_mass_tons
column also contains some missing values that were left alone.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Just to add some additional support for doing this… according to the EPA’s Power Sector Emissions Data Guide:
Update: removed the
fillna(0)
for both gross load and heat content.