question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Figure out how/when to integrate epacamd-eia crosswalk

See original GitHub issue

Extension of Issue #178

Right now the EPA-EIA crosswalk file is only loaded into the pudl db if the EIA data is also getting loaded into the db. This is because the crosswalk depends on EIA for foreign key validation.

The CEMS data also relies on the crosswalk data for access to accurate plant_id_eia values. The values we previously called plant_id_eia in the CEMS data are actually EPA’s estimated ORISPL codes. The crosswalk connects these plant-level estimates to the actual EIA codes via a plant_id_epa, unit_id_epa to plant_id_eia map. Most of the plant IDs are identical across EPA and EIA, but a few are not.

We currently rely on the plant_id_eia field in CEMS fix some of the date entries. The fix_up_dates() function in the epacems transform module uses the plant_id_eia field to map to another dataframe with plant_id_eia and timezone fields.

If we want to use this mapping function accurately, we should merge the crosswalk into the CEMS data first. Merging the crosswalk with CEMS in the transform step is all well and good except that it would now require users that just want to work with CEMS data to also download the EIA data (because the CEMS needs the crosswalk which needs EIA).

How do folks feel about this?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
aesharpecommented, Jul 15, 2022

So if we were going to use the location associated with plant_id_eia for timezone determination, then in the CEMS ETL we would need to join the plant_id_eia column to CEMS first, using the crosswalk mapping to plant_id_epa and then look up each plant’s timezone based on the plant_id_eia rather than plant_id_epa? And then presumably drop the plant_id_eia column before output? Or do we feel like we need to retain that column for easier joining in the future with EIA data?

I am tempted to drop plant_id_epa and just call plant_id_eia the “correct” ORISPL code.

0reactions
aesharpecommented, Jul 18, 2022

I’ve gone ahead and integrated these changes into this PR: #1692

Read more comments on GitHub >

github_iconTop Results From Across the Web

Crosswalking Curriculum Incorporating Knowledge & Skill ...
Steps in performing a Curriculum Crosswalk: Step 1: Identify who will be involved in the curriculum review. ▫ The pathway partners determine the...
Read more >
Using a Crosswalk to Organize the Literature Review
Steps for Creating a Literature Crosswalk​​ 1) Locating and gathering articles related to your research topic. 2) Deciding what aspects of the ...
Read more >
Integrating with Academics - CASEL District Resource Center
“Teachers crosswalk their content with our I Can SEL standards, identifying areas for integration,” says Director of Student Services Gene Olsen. Classrooms ...
Read more >
Crosswalks - Pennsylvania Department of Education
The Core Academic Standards Crosswalk identifies the CEW standards that are met when the listed standards in the other academic area are fully...
Read more >
CROSSWALK: WHERE STUDENT SUPPORT (RE)DEFINED ...
We offer the following questions to guide reflection on this crosswalk and facilitate discussions about how to advance student success on your campus...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found