question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Map EPA unit_id to EIA generator_id

See original GitHub issue

Issue Context

Within the CEMS data for each year, there are on the order of ~1 million observations where the co2_mass_tons observation is missing, even when the generator is reporting an operating time >0 and heat input >0. For my research, I am planning to fill in these missing values based on a calculation and add a “calculated” value to the co2_mass_measurement_code column. I plan to calculate these missing values by multiplying the heat_content_mmbtu column by the fuel-specific emission factor for the fuel used by that generator. To do that, I need to match each unit in cems with its corresponding fuel type reported in boiler_fuel_eia923.csv

Question 1: What is the proper mapping between EPA plant/units and EIA plant/units?

This is my current understanding, which may not be correct:

  • EPA plant_id_eia maps to EIA plant_id_eia
  • EPA unitid maps to EIA boiler_id
  • EPA facility_id -> not sure what this maps to, if anything
  • EPA unit_id_epa -> not sure what this maps to, if anything

Basically, is it correct to match the CEMS unitid column to the EIA boiler_id column in boiler-fuel_eia923.csv or is there some other mapping I need to complete first?

Question 2: Harmonizing unit IDs between EPA and EIA

After attempting to merge the fuel_type_code column from boiler_fuel_eia923 into my epacems data, I am still finding that it is unable to to find a matching boiler_id key for many observations. When investigating further, I found that it seems that these ids have not yet been standardized. For example, for plant_id 10378, epacems lists the unitid as BLR02B where EIA lists the boiler_id as simply 2B.

It seems that based on https://github.com/catalyst-cooperative/pudl/issues/178 the ORISPL (plant_id_eia) codes have been harmonized between the two datasets, but I am wondering if the unitid column from cems has been harmonized with the boiler_id column from EIA?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:18 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
grgmillercommented, Feb 15, 2020

Hi all, I heard back from my contact at EPA and she shared the following excel file that they use for matching units across EPA and EIA data. She said that it was fine to share, and that they actually plan on publishing a final version sometime soon. CAMD EIA unit crosswalk 2018.xlsx

My contact, Justine Huetteman, did ask that the spreadsheet be cited as: United States Environmental Protection Agency (EPA). “Power Sector Emissions Data: EPA-EIA Crosswalk.” Washington, DC: Office of Atmospheric Programs, Clean Air Markets Division.

This is how I have interpreted the column headers in the spreadsheet:

  • ORIS Code refers to the EPA’s plant id code
  • EIA ORIS refers to the EIA’s Plant ID code (what PUDL calls plant_id_eia). This column is only filled in if the EIA’s plant id differs from the EPA’s plant id
  • Unit ID refers to the EPA’s unitid
  • Generator ID refers to the EIA’s Generator Id.
  • Boiler ID refers to the EIA’s Boiler Id

Missing data notes:

  • It looks like all but 71 EPA unitids have been matched to EIA generator ids
  • Many of the boilerids have not been matched in this crosswalk
  • This crosswalk does not include any information about the EIA’s unit code identifier.

There’s also a lot of great notes/caveats about the matching in the Notes column.

Any thoughts on whether this would be incorporated into pudl.glue, and whether there might be an easy way to fill in the boiler_id column using the existing BGA data?

1reaction
zaneselvanscommented, Feb 7, 2020

omg I had no idea you could do that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Maps | US EPA
North American Electric Reliability Corporation (NERC) Regions GIS Data · EIA's US Energy Atlas Energy Infrastructure and Resources Maps ...
Read more >
eGRID2014 Technical Support Document
For units that report to EIA but not to EPA/CAMD, or for units from EPA/CAMD where there are gaps in CO2 emissions data,...
Read more >
Interactive Map of Air Quality Monitors | US EPA
The AirData Air Quality Monitors app is a mapping application available on the web and on mobile devices that displays monitor locations and ......
Read more >
EnviroAtlas Interactive Map | US EPA
This easy to use, interactive mapping application does not require any GIS skills to use and provides ready access to 500+ maps and...
Read more >
Maps - U.S. Energy Information Administration (EIA)
Oil and natural gas · Marketed production of natural gas in the United States and the Gulf of Mexico, 2009 · Available formats:...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found