Use pudl-metadata.json file to make Datasette browsing of PUDL data more powerful
See original GitHub issueIs your feature request related to a problem? Please describe. When I use the Datasette browser, the facets it suggests are not always the most appropriate for the data set. It would be much better to pre-load suggested facets by defining them in a metadata.json file that Datasette can load. Example syntax is given in the Datasette documentation:
{
"databases": {
"sf-trees": {
"tables": {
"Street_Tree_List": {
"facets": ["qLegalStatus"]
}
}
}
}
}
Even better, many columns are defined by well-known units. When such units are specified, pint can format nicely:
{
"custom_units": [
"USD = []",
"MMBTU = 1e6 * BTU",
"fraction = [] = frac",
"percent = 1e-2 frac = pct",
"ppm = 1e-6 fraction"
],
}
...
"databases": {
"pudl": {
"tables": {
"boiler_fuel_eia923": {
"units": {
"sulfur_content_pct": "pct",
"ash_content_pct": "pct"
},
"size": 10
}
}
}
}
}
Describe the solution you’d like I have started work on a pudl-metadata.json file, but whatever I’m doing should be properly rooted in how the PUDL team wants to manage the metadata. So I’m asking whether to create a fresh file that just covers metadata for Datasette output, or whether this should be merged in with higher-level metadata. I have seen from other issues that metadata use for output is “out of scope” for all current metadata issues. Once such a metadata file is established, the community can work on making the queries, facets, outputs, etc., whatever the community needs.
Describe alternatives you’ve considered None. I want to make Datasette sing.
Additional context This is the docker-compose yaml command I’m using to coordinate Datasette as a browser for Jupyter Notebooks running separately (I’ve learned its a bad idea to try to integrate Datasette browsing too much in the execution flow of my notebooks):
version: "3"
services:
notebook:
image: notebook
volumes:
- ./notebook:/home/jovyan/work
ports:
- 8888:8888
pudl:
image: datasette
depends_on:
- datasette
volumes:
- ./notebook/database:/work
ports:
- 8002:8002
command: serve /work/pudl.sqlite --cors --setting base_url /pudl-datasette/ -p 8002 -h 0.0.0.0 --metadata /work/pudl-metadata.json
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Hey @MichaelTiemannOSC, no not that module (oh god, not that module!) – we’re working on a branch that’s associated with PR #806. Check out the metadata subpackage on that branch.
I looked back at my notes and found the need to explicitly request facets for fuel_type_code in boiler_fuel_eia923. I think this was needed because there are surprisingly many fuel_type_codes (coal by any other name is coal to me), and Datasette modestly decides against selecting that as a facet-able dimension.
Here’s where I got to before shifting attention to other data and analytic topics (pudl-metadata.json):