question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Local Docker ETL with local inputs/outputs

See original GitHub issue

Given a Docker container with our CI environment (#1605):

  • Add local volumes to the container to point at PUDL_IN and PUDL_OUT. Maybe with docker compose?
  • Get the container to run the equivalent of tox -e ci while reading & writing data on the local volume.
  • Capture the logs and other outputs from the ETL for later review.
  • Run equivalent of tox -e nuke: all CI, full ETL, and data validation, reading & writing data on the local volume.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
zaneselvanscommented, May 3, 2022

A good overview of docker logging. They outline several available logging strategies. It seems like using a docker logging driver is probably the right option for us, and there is a dedicated Google Cloud Logging driver

In theory the stdout and stderr from the container are sent to the logs, but maybe this only works while the container is running? I don’t seem to be able to get anything out of the logs locally by doing e.g.

docker logs pudl_etl

either when the tests are running or after they’ve completed.

I wonder if this might be affected by the fact that tox/pytest are sitting between the process and the logging? Maybe I should try running pudl_etl directly (and also have it generate some real outputs).

0reactions
zaneselvanscommented, May 3, 2022

Okay, the problem with not being able to write to PUDL_OUT was that there was no pre-populated sqlite or parquet output folders in there. Because the pudl_setup script is written during the build of the container, inside the container. Duh.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dockerizing Airflow - Towards Data Science
Apache Airflow on Docker for local workloads ... Airflow is the de facto ETL orchestration tool in most data engineers tool box. It...
Read more >
Setting Up Step Functions Local (Downloadable Version) and ...
The Step Functions Local Docker image enables you to get started with Step Functions Local quickly by using a Docker image with all...
Read more >
docker run - Docker Documentation
docker run: The `docker run` command first `creates` a writeable container layer over the specified image, and then `starts` it using the specified...
Read more >
How We Solved Our Airflow I/O Problem By Using A Custom ...
This article will show you how to build a custom Airflow Operator to do the following: Supply JSON input into the Docker Container;...
Read more >
Set Up Docker with TLS: /Documentation - LabKey Support
Installation Instructions for Docker Daemon · Install the Docker Daemon · Create TLS Certificates · Change Docker Daemon Configuration · Changes to RServe...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found