CEMS rename emissions/heat input columns as hourly rates
See original GitHub issueThe Issue
The EPA’s Power Sector Emissions Data Guide states that for the CEMS data:
Sources must report data for all hours of operation, including start up and shutdown. This includes partial hours (i.e., an operating time less than the full clock hour). Because mass emissions, electricity generation, and heat input are hourly rates (e.g., pounds per hour), the hourly values should be multiplied by the operating time to calculate the actual emissions, electricity generation, and heat input.
This was a surprise to me, because the column names seem to suggest that these data are reported as absolute measurements (tons or mmbtu) rather than rates (tons/hr or mmbtu/hr). However, I reached out to the EPA and they confirmed that these emissions and heat input measurements need to be multiplied by the operating_time_hours column to get the actual value (just like you have to multiply gross_load_mw by operating_time_hours to get gross generation in MWh). However, they said that the need to do this calculation depends on where the data is sourced from: if downloaded from AMPD, this calculation is done for you, but if you download from the FTP or FACT, you need to do this calculation.
As far as I can tell from the pudl documentation, it seems that pudl pulls the CEMS data from the FTIP site (https://catalystcoop-pudl.readthedocs.io/en/latest/data_sources/epacems.html), so the data reported is actually an hourly rate, instead of a total measurement.
Suggested correction
Option 1: In pudl.extract.cems
, for each of the emissions and heat input columns, rename to make it clear that these are rates
For example, instead of renaming "CO2_MASS": "co2_mass_tons"
, rename it as "co2_rate_tons_hr"
Option 2: Calculate the mass value from the rate and the operating time, as if the data were coming from the AMPD source
Issue Analytics
- State:
- Created a year ago
- Comments:13 (9 by maintainers)
Top GitHub Comments
Hmm interesting. Let me reach out to EPA
All the CEMS data we’re pulling (and have ever pulled) is from their FTP site. The only reason that that AMPD site is listed as the source URL in those docs is that’s the website that you actually go through to get to the FTP site, and if someone wants to go poke around and get more information about the data in general, that’s the place to go.
On a side note, it looks like they’re now providing access to the same files on the FTP server through HTTPS, which is great, and should be much more reliable and easier to work with in the future. I created an issue to update the scraper.