question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use pudl-metadata.json file to make Datasette browsing of PUDL data more powerful

See original GitHub issue

Is your feature request related to a problem? Please describe. When I use the Datasette browser, the facets it suggests are not always the most appropriate for the data set. It would be much better to pre-load suggested facets by defining them in a metadata.json file that Datasette can load. Example syntax is given in the Datasette documentation:

{
    "databases": {
        "sf-trees": {
            "tables": {
                "Street_Tree_List": {
                    "facets": ["qLegalStatus"]
                }
            }
        }
    }
}

Even better, many columns are defined by well-known units. When such units are specified, pint can format nicely:

{
    "custom_units": [
        "USD = []",
        "MMBTU = 1e6 * BTU",
        "fraction = [] = frac",
        "percent = 1e-2 frac = pct",
        "ppm = 1e-6 fraction"
    ],
}
...
    "databases": {
        "pudl": {
            "tables": {
                "boiler_fuel_eia923": {
                    "units": {
                        "sulfur_content_pct": "pct",
                        "ash_content_pct": "pct"
                    },
                    "size": 10
                }
            }
        }
    }
}

Describe the solution you’d like I have started work on a pudl-metadata.json file, but whatever I’m doing should be properly rooted in how the PUDL team wants to manage the metadata. So I’m asking whether to create a fresh file that just covers metadata for Datasette output, or whether this should be merged in with higher-level metadata. I have seen from other issues that metadata use for output is “out of scope” for all current metadata issues. Once such a metadata file is established, the community can work on making the queries, facets, outputs, etc., whatever the community needs.

Describe alternatives you’ve considered None. I want to make Datasette sing.

Additional context This is the docker-compose yaml command I’m using to coordinate Datasette as a browser for Jupyter Notebooks running separately (I’ve learned its a bad idea to try to integrate Datasette browsing too much in the execution flow of my notebooks):

version: "3"
services:
    notebook:
        image: notebook
        volumes:
            - ./notebook:/home/jovyan/work
        ports:
            - 8888:8888

    pudl:
        image: datasette
	depends_on:
            - datasette
	volumes:
            - ./notebook/database:/work
	ports:
            - 8002:8002
	command: serve /work/pudl.sqlite --cors --setting base_url /pudl-datasette/ -p 8002 -h 0.0.0.0 --metadata /work/pudl-metadata.json

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
zaneselvanscommented, Sep 7, 2021

Hey @MichaelTiemannOSC, no not that module (oh god, not that module!) – we’re working on a branch that’s associated with PR #806. Check out the metadata subpackage on that branch.

0reactions
MichaelTiemannOSCcommented, Feb 27, 2022

I looked back at my notes and found the need to explicitly request facets for fuel_type_code in boiler_fuel_eia923. I think this was needed because there are surprisingly many fuel_type_codes (coal by any other name is coal to me), and Datasette modestly decides against selecting that as a facet-able dimension.

Here’s where I got to before shifting attention to other data and analytic topics (pudl-metadata.json):

{
    "title": "PUDL 0.4.0 Dataset",
    "description": "Public Utilities Data from EIA, FERC, and US CENSUS databases",
    "license": "MIT License",
    "license_url": "https://github.com/catalyst-cooperative/pudl/blob/main/LICENSE.txt",
    "source": "The Public Utility Data Liberation Project (PUDL)",
    "source_url": "https://github.com/catalyst-cooperative/pudl",
    "custom_units": [
        "USD = []",
        "Mcf = 1e6 * feet**3 = _ = mcf",
        "gramsU = 1 * gram",
        "kgU = 1000 * gramsU",
        "mwhth = MWh",
        "mwdth = 24 * mwhth",
        "MMBTU = 1e6 * BTU = _ = mmbtu",
        "fraction = [] = frac",
        "percent = 1e-2 frac = pct",
        "ppm = 1e-6 fraction"
    ],
    "databases": {
        "pudl": {
            "queries": {
            },
            "tables": {
                "assn_gen_eia_unit_epa": {
                    "description_html": "Custom <em>table</em> description",
                    "license": "MIT License",
                    "license_url": "https://github.com/catalyst-cooperative/pudl/blob/main/LICENSE.txt",
                    "size": 10
                },
                "assn_plant_id_eia_epa": {
                    "size": 10
                },
                "boiler_fuel_eia923": {
                    "facets": ["fuel_type_code"],
                    "units": {
                        "sulfur_content_pct": "pct",
                        "ash_content_pct": "pct"
                    },
                    "size": 10
                },
                "boiler_generator_assn_eia860": {
                    "size": 10
                },
                "boilers_entity_eia": {
                    "size": 10
                },
                "coalmine_eia923": {
                    "size": 10
                },
                "energy_source_eia923": {
                    "size": 10
                },
                "ferc_accounts": {
                    "size": 10
                },
                "ferc_depreciation_lines": {
                    "size": 10
                },
                "fuel_ferc1": {
                    "units": {
                        "fuel_cost_per_mmbtu": "USD/MMBTU"
                    },
                    "size": 10
                },
                "fuel_receipts_costs_eia923": {
                    "units": {
                        "fuel_cost_per_mmbtu": "USD/MMBTU"
                    },
                    "size": 10
                },
                "fuel_type_aer_eia923": {
                    "size": 10
                },
                "fuel_type_eia923": {
                    "size": 10
                },
                "generation_eia923": {
                    "units": {
                        "net_generation_mwh": "MWh"
                    },
                    "size": 10
                },
                "generation_fuel_eia923": {
                    "units": {
                        "fuel_consumed_mmbtu": "MMBTU",
                        "net_generation_mwh": "MWh"
                    },                   
                    "size": 10
                },
                "generators_eia860": {
                    "units": {
                        "capacity_mw": "MW",
                        "summer_capacity_mw": "MW",
                        "winter_capacity_mw": "MW",
                        "net_generation_mwh": "MWh",
                        "planned_net_summer_capacity_uprate_mw": "MW",
                        "planned_net_winter_capacity_uprate_mw": "MW",
                        "planned_net_summer_capacity_derate_mw": "MW",
                        "planned_net_winter_capacity_derate_mw": "MW",
                        "planned_new_capacity_mw": "MW",
                        "minimum_load_mw": "MW",
                        "summer_estimated_capability_mw": "MW",
                        "winter_estimated_capability_mw": "MW"
                    },                   
                    "size": 10
                },
                "generators_entity_eia": {
                    "size": 10
                },
                "ownership_eia860": {
                    "size": 10
                },
                "plant_in_service_ferc1": {
                    "size": 10
                },
                "plant_unit_epa": {
                    "size": 10
                },
                "plants_eia": {
                    "size": 10
                },
                "plants_eia860": {
                    "size": 10
                },
                "plants_entity_eia": {
                    "size": 10
                },
                "plants_ferc1": {
                    "size": 10
                },
                "plants_hydro_ferc1": {
                    "units": {
                        "capacity_mw": "MW",
                        "peak_demand_mw": "MW",
                        "net_capacity_favorable_conditions_mw": "MW",
                        "net_capacity_adverse_conditions_mw": "MW",
                        "net_generation_mwh": "MWh",
                        "capex_land": "USD",
                        "capex_structures": "USD",
                        "capex_facilities": "USD",
                        "capex_equipment": "USD",
                        "capex_roads": "USD",
                        "asset_retirement_cost": "USD",
                        "capex_total": "USD",
                        "capex_per_mw": "USD/MW",
                        "opex_operations": "USD",
                        "opex_water_for_power": "USD",
                        "opex_hydraulic": "USD",
                        "opex_electric": "USD",
                        "opex_generation_misc": "USD",
                        "opex_rents": "USD",
                        "opex_engineering": "USD",
                        "opex_structures": "USD",
                        "opex_dams": "USD",
                        "opex_plant": "USD",
                        "opex_misc_plant": "USD",
                        "opex_total": "USD",
                        "opex_per_mwh": "USD/MWh"
                    },               
                    "size": 10
                },
                "plants_pudl": {
                    "size": 10
                },
                "plants_pumped_storage_ferc1": {
                    "units": {
                        "capacity_mw": "MW",
                        "peak_demand_mw": "MW",
                        "plant_capability_mw": "MW",
                        "net_generation_mwh": "MWh",
                        "energy_used_for_pumping_mwh": "MWh",
                        "net_load_mwh": "MWh",
                        "capex_land": "USD",
                        "capex_structures": "USD",
                        "capex_facilities": "USD",
                        "capex_wheels_turbines_generators": "USD",
                        "capex_equipment_electric": "USD",
                        "capex_equipment_misc": "USD",
                        "capex_roads": "USD",
                        "asset_retirement_cost": "USD",
                        "capex_total": "USD",
                        "capex_per_mw": "USD/MW",
                        "opex_operations": "USD",
                        "opex_water_for_power": "USD",
                        "opex_pumped_storage": "USD",
                        "opex_electric": "USD",
                        "opex_generation_misc": "USD",
                        "opex_rents": "USD",
                        "opex_engineering": "USD",
                        "opex_structures": "USD",
                        "opex_dams": "USD",
                        "opex_plant": "USD",
                        "opex_misc_plant": "USD",
                        "opex_production_before_pumping": "USD",
                        "opex_pumping": "USD",
                        "opex_total": "USD",
                        "opex_per_mwh": "USD/MWh"
                    },               
                    "size": 10
                },
                "plants_small_ferc1": {
                    "units": {
                        "capacity_mw": "MW",
                        "peak_demand_mw": "MW",
                        "net_generation_mwh": "MWh",
                        "total_cost_of_plant": "USD",
                        "capex_per_mw": "USD/MW",
                        "opex_total": "USD",
                        "opex_fuel": "USD",
                        "opex_maintenance": "USD",
                        "fuel_cost_per_mmbtu": "USD/MMBTU"
                    },               
                    "size": 10
                },
                "plants_steam_ferc1": {
                    "units": {
                        "capacity_mw": "MW",
                        "peak_demand_mw": "MW",
                        "plant_capability_mw": "MW",
                        "water_limited_capacity_mw": "MW",
                        "not_water_limited_capacity_mw": "MW",
                        "net_generation_mwh": "MWh",
                        "capex_land": "USD",
                        "capex_structures": "USD",
                        "capex_equipment": "USD",
                        "capex_total": "USD",
                        "capex_per_mw": "USD/MW",
                        "opex_operations": "USD",
                        "opex_fuel": "USD",
                        "opex_coolants": "USD",
                        "opex_steam": "USD",
                        "opex_steam_other": "USD",
                        "opex_transfer": "USD",
                        "opex_electric": "USD",
                        "opex_misc_power": "USD",
                        "opex_rents": "USD",
                        "opex_allowances": "USD",
                        "opex_engineering": "USD",
                        "opex_structures": "USD",
                        "opex_boiler": "USD",
                        "opex_plants": "USD",
                        "opex_misc_steam": "USD",
                        "opex_production_tota;": "USD",
                        "opex_per_mwh": "USD/MWh",
                        "asset_retirement_cost": "USD"
                    },               
                    "size": 10
                },
                "prime_movers_eia923": {
                    "size": 10
                },
                "purchased_power_ferc1": {
                    "units": {
                        "billing_demand_mw": "MW",
                        "non_coincident_peak_demand_mw": "MW",
                        "coincident_peak_demand_mw": "MW",
                        "purchased_mwh": "MWh",
                        "received_mwh": "MWh",
                        "delivered_mwh": "MWh",
                        "demand_charges": "USD",
                        "energy_charges": "USD",
                        "other_charges": "USD",
                        "total_settlement": "USD"
                    },
                    "size": 10
                },
                "transport_modes_eia923": {
                    "size": 10
                },
                "utilities_eia": {
                    "size": 10
                },
                "utilities_eia860": {
                    "size": 10
                },
                "utilities_entity_eia": {
                    "size": 10
                },
                "utilities_ferc1": {
                    "size": 10
                },
                "utilities_pudl": {
                    "size": 10
                },
                "utility_plant_assn": {
                    "size": 10
                }
            }
        }
    }
}
Read more comments on GitHub >

github_iconTop Results From Across the Web

PUDL 0.6.1.dev1266+gd73143f4.d20221207 documentation
You can browse and query the data, make simple charts and maps, and download portions of the data as CSV files or JSON...
Read more >
Publishing a PUDL API with Datasette - Catalyst Cooperative
Catalyst is experimenting with publishing our open energy data using Datasette, which provides a web interface as well as a PUDL API.
Read more >
The Public Utility Data Liberation (PUDL) Project adds Clean ...
The PUDL project, which is coordinated by Catalyst Cooperative, focuses on creating an energy utility data product that can serve a wide range ......
Read more >
Frictionless Public Utility Data - A Pilot Study
PUDL was inspired to make this data more accessible because the current US utility data ecosystem fragmented, and commercial products are ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found