datapkg_to_sqlite fails to load all of EPA CEMS
See original GitHub issueAfter doing a full ETL of all years and states in CEMS, the datapkg_to_sqlite
script doesn’t seem to load all of that data into the SQLite database. Rather, it only loads the last year of data into the database. However, the process terminates quickly, so it’s probably not even attempting to load all the data. Suspect it’s an issue with the iteration and/or partitioning…
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
PUDL Data Release v1.0.0 - Zenodo
Load All of PUDL in a Single Line Use cd to get into your new directory ... Convert the EPA CEMS data package...
Read more >Building and Testing PUDL — PUDL 0.3.2 documentation
The ETL tests run the data processing pipeline on either the most recent year of data, or all working years of data. The...
Read more >Could not load file or assembly 'System.Data.SQLite'
System.Data.SQLite.dll is a mixed assembly, i.e. it contains both managed code and native code. Therefore a particular System.Data.
Read more >Frictionless Public Utility Data: A Pilot Study
The Catalyst team used Tabular Data Packages to record and store this metadata ... from EPA CEMS (e.g., ramp rates, min/max operating loads, ......
Read more >PUDL v0.5.0: 2020 and Beyond - Catalyst Cooperative
In practice, we always loaded the data packages into SQLite and ... into a database or (in the case of the 800,000,000 row...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey @roll! this was totally just our mistake, so don’t worry about it. We are making a bunch of data source specific data packages and then squishing them together into one package, I set up a process for determining how to generate a new data package without duplicating elements of the metadata… but I messed up and it was only grabbing one of the CEMS resources. It was a very simple fix once we figured out what was happening.
No, still getting the error with the most recent versions of the datapackage libraries. A very simple version with a couple of resources in a group seems to work as expected, but the simplest PUDL output that tests the behavior doesn’t work. I’m trying to simplify that resource group output one step at a time until I get to a minimal example to share with you.