3rd-party packages firing "ModuleNotFoundError" on official airflow 2.0.2 images
See original GitHub issueApache Airflow version: 2.0.2
Kubernetes version (if you are using kubernetes) (use kubectl version
):
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release): official airflow images apache/airflow:2.0.2-python3.8 and apache/airflow:2.0.2-python3.6
- Kernel (e.g.
uname -a
): - Install tools: docker
- Others:
What happened:
Using the official 2.0.2 airflow image the 3rd-party packages are not being recognized by the DAG parser, firing the ModuleNotFoundError
on DAGs and plugins.
As a test, I created a new docker image with one 3rd-party package (surveymonty
). A simple DAG with no tasks imports it. The following error is observed:
Broken DAG: [/usr/src/airflow/dags/simpledag.py] Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/src/airflow/dags/simpledag.py", line 2, in <module>
import surveymonty
ModuleNotFoundError: No module named 'surveymonty'
I could confirm the package is installed by running pip freeze
and python simpledag.py
with no errors on both cases.
What you expected to happen: 3rd-party packages to be recognized by the parser.
Using a python image and manually installing airflow 2.0.2 and the requirements, the issue is not present.
How to reproduce it: Create the following Dockerfile:
FROM apache/airflow:2.0.2-python3.8
USER root
RUN pip3 install surveymonty==0.2.5
WORKDIR /usr/src/airflow
ENV AIRFLOW_HOME /usr/src/airflow
ENV AIRFLOW_GPL_UNIDECODE true
RUN chown airflow:airflow .
USER airflow
COPY ./simpledag.py dags/simpledag.py
# Airflow webserver
EXPOSE 8080
ENTRYPOINT []
simpledag.py:
from airflow import DAG
import surveymonty
dag = DAG('test', schedule_interval="* * * * *")
docker-compose.yml:
version: '3.8'
services:
db:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
webserver:
build: .
restart: always
depends_on:
- db
environment:
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__DAGS_FOLDER=/usr/src/airflow/dags
- AIRFLOW__CORE__PLUGINS_FOLDER=/usr/src/airflow/plugins
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://airflow:airflow@db:5432/airflow
ports:
- "8080:8080"
command: bash -c "airflow initdb && airflow webserver"
scheduler:
build: .
restart: always
depends_on:
- db
environment:
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__DAGS_FOLDER=/usr/src/airflow/dags
- AIRFLOW__CORE__PLUGINS_FOLDER=/usr/src/airflow/plugins
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://airflow:airflow@db:5432/airflow
command: airflow scheduler
I’m my production case, all 3rd-party libraries imported on plugins and DAGs fire the ModuleNotFoundError
and the latest Google provider has multiple failures as well.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Installing the pypi dependencies on the airflow user solved the issue! Thank you so much!
FYI: This works: