question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CEMS rename emissions/heat input columns as hourly rates

See original GitHub issue

The Issue

The EPA’s Power Sector Emissions Data Guide states that for the CEMS data:

Sources must report data for all hours of operation, including start up and shutdown. This includes partial hours (i.e., an operating time less than the full clock hour). Because mass emissions, electricity generation, and heat input are hourly rates (e.g., pounds per hour), the hourly values should be multiplied by the operating time to calculate the actual emissions, electricity generation, and heat input.

This was a surprise to me, because the column names seem to suggest that these data are reported as absolute measurements (tons or mmbtu) rather than rates (tons/hr or mmbtu/hr). However, I reached out to the EPA and they confirmed that these emissions and heat input measurements need to be multiplied by the operating_time_hours column to get the actual value (just like you have to multiply gross_load_mw by operating_time_hours to get gross generation in MWh). However, they said that the need to do this calculation depends on where the data is sourced from: if downloaded from AMPD, this calculation is done for you, but if you download from the FTP or FACT, you need to do this calculation.

As far as I can tell from the pudl documentation, it seems that pudl pulls the CEMS data from the FTIP site (https://catalystcoop-pudl.readthedocs.io/en/latest/data_sources/epacems.html), so the data reported is actually an hourly rate, instead of a total measurement.

Suggested correction

Option 1: In pudl.extract.cems, for each of the emissions and heat input columns, rename to make it clear that these are rates

For example, instead of renaming "CO2_MASS": "co2_mass_tons", rename it as "co2_rate_tons_hr"

Option 2: Calculate the mass value from the rate and the operating time, as if the data were coming from the AMPD source

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:13 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
grgmillercommented, Jul 21, 2022

Hmm interesting. Let me reach out to EPA

0reactions
zaneselvanscommented, Jul 27, 2022

All the CEMS data we’re pulling (and have ever pulled) is from their FTP site. The only reason that that AMPD site is listed as the source URL in those docs is that’s the website that you actually go through to get to the FTP site, and if someone wants to go poke around and get more information about the data in general, that’s the place to go.

On a side note, it looks like they’re now providing access to the same files on the FTP server through HTTPS, which is great, and should be much more reliable and easier to work with in the future. I created an issue to update the scraper.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Computing Hourly Rates of Pay Using the 2,087-Hour Divisor
Hourly rates of basic pay are computed by dividing an employee's annual rate of basic pay by 2,087 hours. Rates must be rounded...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found