Map EPA unit_id to EIA generator_id
See original GitHub issueIssue Context
Within the CEMS data for each year, there are on the order of ~1 million observations where the co2_mass_tons
observation is missing, even when the generator is reporting an operating time >0 and heat input >0. For my research, I am planning to fill in these missing values based on a calculation and add a “calculated” value to the co2_mass_measurement_code
column. I plan to calculate these missing values by multiplying the heat_content_mmbtu
column by the fuel-specific emission factor for the fuel used by that generator. To do that, I need to match each unit in cems with its corresponding fuel type reported in boiler_fuel_eia923.csv
Question 1: What is the proper mapping between EPA plant/units and EIA plant/units?
This is my current understanding, which may not be correct:
- EPA
plant_id_eia
maps to EIAplant_id_eia
- EPA
unitid
maps to EIAboiler_id
- EPA
facility_id
-> not sure what this maps to, if anything - EPA
unit_id_epa
-> not sure what this maps to, if anything
Basically, is it correct to match the CEMS unitid
column to the EIA boiler_id
column in boiler-fuel_eia923.csv
or is there some other mapping I need to complete first?
Question 2: Harmonizing unit IDs between EPA and EIA
After attempting to merge the fuel_type_code
column from boiler_fuel_eia923
into my epacems data, I am still finding that it is unable to to find a matching boiler_id
key for many observations. When investigating further, I found that it seems that these ids have not yet been standardized. For example, for plant_id 10378, epacems lists the unitid as BLR02B
where EIA lists the boiler_id
as simply 2B
.
It seems that based on https://github.com/catalyst-cooperative/pudl/issues/178 the ORISPL (plant_id_eia
) codes have been harmonized between the two datasets, but I am wondering if the unitid
column from cems has been harmonized with the boiler_id
column from EIA?
Issue Analytics
- State:
- Created 4 years ago
- Comments:18 (13 by maintainers)
Top GitHub Comments
Hi all, I heard back from my contact at EPA and she shared the following excel file that they use for matching units across EPA and EIA data. She said that it was fine to share, and that they actually plan on publishing a final version sometime soon. CAMD EIA unit crosswalk 2018.xlsx
My contact, Justine Huetteman, did ask that the spreadsheet be cited as: United States Environmental Protection Agency (EPA). “Power Sector Emissions Data: EPA-EIA Crosswalk.” Washington, DC: Office of Atmospheric Programs, Clean Air Markets Division.
This is how I have interpreted the column headers in the spreadsheet:
ORIS Code
refers to the EPA’s plant id codeEIA ORIS
refers to the EIA’sPlant ID
code (what PUDL callsplant_id_eia
). This column is only filled in if the EIA’s plant id differs from the EPA’s plant idUnit ID
refers to the EPA’sunitid
Generator ID
refers to the EIA’sGenerator Id
.Boiler ID
refers to the EIA’sBoiler Id
Missing data notes:
unit code
identifier.There’s also a lot of great notes/caveats about the matching in the Notes column.
Any thoughts on whether this would be incorporated into pudl.glue, and whether there might be an easy way to fill in the boiler_id column using the existing BGA data?
omg I had no idea you could do that.