EPA EmPOWER Project
See original GitHub issueThis project, which was selected as part of the EPA’s 2nd Annual EmPOWER Air Data Challenge, aims to develop a new dataset of hourly average emissions factors for the United States, to complement and be published alongside the EPA’s eGRID database.
The two objectives of this project are:
- Develop an open-source, python-based workflow to re-create the eGRID2018 database, using the existing eGRID2018 methodology.
- Develop an “eGRID hourly” dataset using 2019 data, that builds upon the eGRID2018 methodology with new datasets (EIA-930) and methodologies to calculate hourly average emissions factors for each grid region in the U.S.
For both of these objectives, the basic steps are:
- Download the data from sources including (EIA-860, EIA-861, EIA-923, EIA-930, EPA CEMS) - including 2019 data
- Crosslink/matchup data across these datasets (addressing https://github.com/catalyst-cooperative/pudl/issues/178 and https://github.com/catalyst-cooperative/pudl/issues/535 and https://github.com/catalyst-cooperative/pudl/issues/338)
- Clean and adjust the data (including calculating net-to-gross generation ratio https://github.com/catalyst-cooperative/pudl/issues/245)
- Aggregate data to the plant level
- Calculate emissions factors
- Roll this data up to different geographic/grid regions and output Excel tables with final data
Currently, I think that this work will involve creating or editing the following modules in pudl (although looking forward to input on whether this makes sense):
- May need to address some issues in the existing ETL process (such as https://github.com/catalyst-cooperative/pudl/issues/595 and https://github.com/catalyst-cooperative/pudl/issues/604)
- Need a way to load and clean EIA-930 data per https://github.com/catalyst-cooperative/pudl/issues/466 and https://github.com/catalyst-cooperative/pudl/issues/600 @truggles
analysis.egrid
new module that will contain all of the functions to perform steps 2-5 abovepackage_data/glue/
will contain new crosswalk tables provided by the epapackage_data/epa/egrid/
will contain static tables like emission factorsoutput.egrid
new module for compiling all of the output data and building an excel spreadsheet that will represent the final product (step 6)
Workflow: I will be adding code in a fork located at grgmiller/pudl, and will regularly sync this fork with the sprint20
branch to keep it up to date. I will make periodic pull requests back to the main project.
As I work on each aspect of this project, I will create specific issues to track progress.
More details about the project can be found in the EmPOWER Proposal - eGRID Hourly Emissions Factors.pdf
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (2 by maintainers)
Top GitHub Comments
In past years the final EIA 860/923 data has become available in the fall, usually some time in September. As soon as it’s all released we’ll get on integrating it, and unless something big has changed it should take a week or two to integrate. Not sure how long it would be until that integrated data shows up in a packaged release, but we would make pushing out a new one a priority as soon as the 2019 data is integrated. We wait until the “final” data is released so that we don’t have to work around irregularities in the early release, and then go back in and remove those work arounds and deal with different formatting in the final release.
I think that we may be able to close this issue with the release of https://github.com/singularity-energy/open-grid-emissions.
There’s still ongoing work to do in improving the dataset, but it may make more sense to track as part of issues in this new repo.
What do you think @cmgosnell @zaneselvans