Figure out how/when to integrate epacamd-eia crosswalk
See original GitHub issueExtension of Issue #178
Right now the EPA-EIA crosswalk file is only loaded into the pudl db if the EIA data is also getting loaded into the db. This is because the crosswalk depends on EIA for foreign key validation.
The CEMS data also relies on the crosswalk data for access to accurate plant_id_eia
values. The values we previously called plant_id_eia
in the CEMS data are actually EPA’s estimated ORISPL codes. The crosswalk connects these plant-level estimates to the actual EIA codes via a plant_id_epa, unit_id_epa
to plant_id_eia
map. Most of the plant IDs are identical across EPA and EIA, but a few are not.
We currently rely on the plant_id_eia
field in CEMS fix some of the date entries. The fix_up_dates()
function in the epacems transform module uses the plant_id_eia
field to map to another dataframe with plant_id_eia
and timezone
fields.
If we want to use this mapping function accurately, we should merge the crosswalk into the CEMS data first. Merging the crosswalk with CEMS in the transform step is all well and good except that it would now require users that just want to work with CEMS data to also download the EIA data (because the CEMS needs the crosswalk which needs EIA).
How do folks feel about this?
Issue Analytics
- State:
- Created a year ago
- Comments:12 (11 by maintainers)
Top GitHub Comments
I am tempted to drop
plant_id_epa
and just callplant_id_eia
the “correct” ORISPL code.I’ve gone ahead and integrated these changes into this PR: #1692