Local Docker ETL with local inputs/outputs
See original GitHub issueGiven a Docker container with our CI environment (#1605):
- Add local volumes to the container to point at
PUDL_IN
andPUDL_OUT
. Maybe withdocker compose
? - Get the container to run the equivalent of
tox -e ci
while reading & writing data on the local volume. - Capture the logs and other outputs from the ETL for later review.
- Run equivalent of
tox -e nuke
: all CI, full ETL, and data validation, reading & writing data on the local volume.
Issue Analytics
- State:
- Created a year ago
- Comments:13 (13 by maintainers)
Top Results From Across the Web
Dockerizing Airflow - Towards Data Science
Apache Airflow on Docker for local workloads ... Airflow is the de facto ETL orchestration tool in most data engineers tool box. It...
Read more >Setting Up Step Functions Local (Downloadable Version) and ...
The Step Functions Local Docker image enables you to get started with Step Functions Local quickly by using a Docker image with all...
Read more >docker run - Docker Documentation
docker run: The `docker run` command first `creates` a writeable container layer over the specified image, and then `starts` it using the specified...
Read more >How We Solved Our Airflow I/O Problem By Using A Custom ...
This article will show you how to build a custom Airflow Operator to do the following: Supply JSON input into the Docker Container;...
Read more >Set Up Docker with TLS: /Documentation - LabKey Support
Installation Instructions for Docker Daemon · Install the Docker Daemon · Create TLS Certificates · Change Docker Daemon Configuration · Changes to RServe...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
A good overview of docker logging. They outline several available logging strategies. It seems like using a docker logging driver is probably the right option for us, and there is a dedicated Google Cloud Logging driver
In theory the
stdout
andstderr
from the container are sent to the logs, but maybe this only works while the container is running? I don’t seem to be able to get anything out of the logs locally by doing e.g.either when the tests are running or after they’ve completed.
I wonder if this might be affected by the fact that tox/pytest are sitting between the process and the logging? Maybe I should try running
pudl_etl
directly (and also have it generate some real outputs).Okay, the problem with not being able to write to
PUDL_OUT
was that there was no pre-populatedsqlite
orparquet
output folders in there. Because thepudl_setup
script is written during the build of the container, inside the container. Duh.