Estimate monthly plant-level fuel prices w/o using the EIA API
See original GitHub issueInstead of using the EIA API to pull monthly average fuel costs by state and fuel when individual fuel deliveries have their costs redacted in the fuel_receipts_costs_eia923
table, calculate it for ourselves.
Motivation
This change will address several issues:
- The EIA API is missing a fair amount of the data anyway. Sometimes whole state-months are missing. It also only contains data for coarse fuel categories (coal, petroleum, natural gas) rather than the specific fuel types.
- Relying on the API means asking users to register for an API key and manage environment variables. This is a barrier for many of our less technical users.
- Whenever something goes wrong with the API, our CI tests fail, and we can’t work with this data locally. Over time this has been happening more frequently. HTML gets returned instead of JSON, or the network is down.
- EIA is discontinuing the v1 API in November, 2022, so our current setup will stop working anyway.
- There’s a lot of information in the
fuel_receipts_costs_eia923
table, and related to the plants and mines and suppliers involved. It should be possible to do a fairly good estimation of the fuel prices from scratch given all that context.
Approach
- Estimate fuel prices using a variety of aggregations and use them to fill missing values.
- Start with the most granular / accurate and progressively apply less specific estimates until everything is filled in.
- Tag each record indicating which estimation was used to fill it in.
- Pre-calculate all of the aggregations so that we can look at how they compare with actual values first.
- Add each of these aggregations to the original FRC dataframe for plotting.
- We should also include the EIA API values for comparison / constraint based on the redacted values.
- Looking at the EIA API, only PEL, PC, COW, and NG really have values for $/MMBTU at census region and state level.
- Seems like the very granular fuel types only have prices for US Total, at least at the monthly level.
- Use median values of the fuel prices in $/MMBTU
- Maybe calculate a weighted median? Want typical MMBTU, not typical delivery.
- @gschivley and Neha suggested using both spatial and temporal interpolation – averaging prices from the adjacent states, and filling in gaps in the monthly time series when possible.
- We could also use a low-effort, but powerful estimator like XGBoost or a random forest to try and incorporate much more information, without designing something bespoke from scratch.
- We should be able to benchmark these calculations against the data from the API or the specific information reported in the FRC table by doing some random knockouts to see how well we can recreate the reported values.
Choosing Aggregations
- How do we decide how to prioritize aggregations?
- Coal prices don’t vary much month to month, aggregating annually would have little impact.
- Gas & Petroluem prices can vary dramatically month to month, so aggregating across time is bad.
- Petroleum fuel prices are highly correlated nationwide, so aggregating geographically has little impact.
Intuitive Aggregation Priorities
- Most precise:
["state", "energy_source_code", "report_date"]
- Annual aggregation (coal):
["state", "energy_source_code", "report_year"]
- Regional aggregation (petroleum):
["census_region", "energy_source_code", "report_date"]
- Fuel type aggregation:
["state", "fuel_group_code", "report_date"]
- Both regional and fuel type aggregation:
["census_region", "fuel_group_code", "report_date"]
- Annual, regional, and fuel type aggregations:
["census_region", "fuel_group_code", "report_year"]
Questions:
- Should we use a MMBTU weighted median rather than delivery weighted median?
- How should we identify outlier values in the fuel prices which should be replaced? Some are totally whacked.
Other Potential Refinements
- Automatically fill using aggregations in order of increasing dispersion of the error distribution (e.g. IQR) rather than hard-coding the order based on intuition and eyeballing it.
- Calculate the dispersion of the error distribution on an annual basis, rather than across the entire timeline, in case the temporal, fuel type & spatial correlations change over time.
Remaining tasks:
- Always
plant_state
into thefuel_receipts_costs_eia923
output table all the time. - Add the census regions to state mappings into the metadata enums / constants.
- Replace the existing roll & fill method in the
fuel_receipts_costs_eia923
output routine. - Update tests to work with the new version of
frc_eia923
- Remove
API_KEY_EIA
infrastructure from everywhere in the code, so we aren’t unknowingly relying on it. - Make filling in missing fuel prices the default behavior
- Fix the
filled_by
labeling, which is now showing all filled values havingnational_fgc_year
which is the last aggregation. - Remove
fuel_group_code
from thefuel_receipts_costs_eia923
table and add it to theenergy_sources_eia
coding table, and add it back into the output function. - Understand why these changes are apparently affecting ouput row counts
- Pull the fuel price filling out into its own separate function
- Understand why
merge_date()
is removing ~10kfrc_eia923
records. - Implement weighted median function to use in filling & identifying outliers
- Add weighted median unit tests
- Identify outlying fuel prices using modified z-score with MMBTU weighted median
- Have @cmgosnell look for weirdness in the results of a new MCOE calculation in an RMI context.
- Update release notes
- After merging into
main
removeAPI_KEY_EIA
from the GitHub secrets.
Issue Analytics
- State:
- Created 2 years ago
- Comments:49 (48 by maintainers)
Top Results From Across the Web
Opendata - U.S. Energy Information Administration (EIA)
Petroleum · Summary · includes weekly, monthly, and annual summary data for oil supply and disposition, supply estimates, prices, and sales volumes.
Read more >U.S. Energy Information Administration - EIA - EIA
For 2009 forward, state-level nuclear fuel prices are estimated by EIA based on plant-level fuel cost data compiled by SNL Energy. For states...
Read more >Retail Motor Gasoline and On-Highway Diesel Fuel Prices - EIA
API Query Browser. EIA Data Sets > Total Energy > Energy Prices > Retail Motor Gasoline and On-Highway Diesel Fuel Prices. API CALL...
Read more >EIA's API Technical Documentation - U.S. Energy Information ...
Here, we'll ask for residential prices, tabulated monthly. https://api.eia.gov/v2/electricity/retail-sales/data?api_key=xxxxxx&data[]=price&facets[sectorid][] ...
Read more >Total Energy Monthly Data - U.S. Energy Information ... - EIA
This publication includes total energy production, consumption, stocks, and trade; energy prices; overviews of petroleum, natural gas, coal, electricity, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@TrentonBush The earlier scatter plots are comparing all the reported data – so the ones where there actually was data in the FRC table, and they’re only being aggregated by
[state, month, fuel_group]
The more scatter recent plots are only looking at data points that were not present in the FRC table, and comparing the values which were filled in by our new method (breaking it out into all the different kinds of aggregation used) vs. the API values. So it’s not surprising that the correlation is worse in general.
Hey sorry, just tuning in! I did a spatial interpolation for fuel prices to average down to the county level for new build estimates in this paper. Is it mostly the non ISO regions that are short of data?