question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Treatment of missing values during CEMS transformation

See original GitHub issue

The EPA cems transformation process currently replaces missing values for gross_load_mw and heat_content_mmbtu with the following code: raw_df.fillna({ “gross_load_mw”: 0.0, “heat_content_mmbtu”: 0.0 })

I was wondering if there is a strong reason why these missing values are being filled?

As I’ve been exploring this data further, I have been finding that a null value and a zero value are not interchangable. For example, during startup, heat input and operating time can both be nonzero, but gross load can be zero.

I’d like to be able to identify some methods to fill missing values in CEMS using estimated values, but the current datapackage makes it difficult to identify which values are missing and which are actually reported as zero, since all the missing values have been filled.

I would propose removing the above fillna code in the pudl/src/pudl/transform/epacems.py file.

I don’t think that keeping these values as NA should break any other code further down the line, as the co2_mass_tons column also contains some missing values that were left alone.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
grgmillercommented, Sep 3, 2022

Just to add some additional support for doing this… according to the EPA’s Power Sector Emissions Data Guide:

A blank in the data (emissions, heat input, load) is not the same thing as a zero. A blank likely indicates that the EGU is not required to report a particular parameter based on the program(s) it is affected by. Checking the program code(s) for the EGU may help explain why certain parameters appear or do not appear in the data. Refer to Table 1 under “What data are collected?” for more information on program reporting requirements.

0reactions
aesharpecommented, Aug 30, 2022

Update: removed the fillna(0) for both gross load and heat content.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Statistical data preparation: management of missing values ...
Therefore, adequate treatment of missing data and outliers is crucial for analysis. In this review paper, we discuss the types of missing ...
Read more >
Plain English Guide to the Part 75 Rule
Subpart D (§§75.30-37) describes the missing data procedures that are used to determine the appropriate substitute data values, for unit operating hours in...
Read more >
CONTINUOUS EMISSION MONITORING - GovInfo
(2) Statistical estimation procedures for missing data are included in appendix C to this part. Optional protocols for estimating SO 2 mass emissions...
Read more >
Rule 218.3 - Continuous Emission Monitoring System - AQMD
(3). CALIBRATION means a procedure performed to ensure that the CEMS accurately measures and records the concentration of the specific air.
Read more >
Machine Learning — Missing Data and Data Transformation #6
It refers to the lack of observations in the examined data set. Data may have missing values for many reasons, such as observations...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found