question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

use PythonVirtualenvOperator with a prebuilt env

See original GitHub issue

Description Instead of passing in the requirements and relying Airflow to build the env, in some cases it would be more straightforward and desirable to just make Airflow use a prebuilt env.

This could be done with PythonVirtualenvOperator with a param like env_path.

Use case / motivation

virtualenv_task = PythonVirtualenvOperator(
    task_id="virtualenv_python",
    python_callable=callable_virtualenv,
    env_path='./color-env', # the path to prebuilt env
    # requirements=["colorama==0.4.0"], # replaces this
    system_site_packages=False,
    dag=dag,
)

Are you willing to submit a PR?

Perhaps

Related Issues

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:10
  • Comments:15 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
gaoyibin0001commented, Feb 21, 2022

@ManikandanUV We are doing it the following way for now:

env = vars_dict.get("conda_env", None)
path_to_python = f"/home/username/.conda{'/envs/'+env if env is not None else ''}/bin/python"

parse_files = BashOperator(
            task_id='parse-files',
            bash_command=f"{path_to_python} {abs_path_code}/my_repo/parse.py {files_to_parse}",
            env={"PATH": os.environ["PATH"],
                 "DB_CONN": db_conn}
        )

We have an environment variable containing the conda-env name which is used to get the full path to the Python executable. Then, using a BashOperator, we can use the same environment again for different Tasks.

Additionally, we run an update to the environment if requirements changed (note that we are using poetry as package manager):

update_repo = BashOperator(
    task_id=f"update-repo-{folder}",
    bash_command=f"cd {abs_path_code}/{folder}; "
           "git checkout master; git stash; git stash drop; git pull"
    )
install_dependencies = BashOperator(
    task_id=f"install-dependencies-{folder}",
          bash_command=f"cd {abs_path_code}/{folder}; conda activate {env_name}; poetry install "
    )
update_repo >> install_dependencies
`
``

may use "conda run -n env_name python xxx.py

1reaction
potiukcommented, Apr 8, 2021

Just one comment - this fine, if you can make sure all your - distributed - venvs are present on all the workers (which might be tricky if you want to update those) - and you have to somehow link the “task” definition (expecting certain venv with certain requirement versions) with the “deployment” (i.e. worker definition). Any kind of “upgrade” to such an env might be tricky. The “local” installation pattern had the advantage, that you always got the requirements in the version you described in the task definition (via requirements specification).

I think a better solution would be to use caching mechanism to the task and modify the PythonVirtualenv to use it. However this might be tricky to get right when you have multiple tasks of the same type running in the same runner in Celery deployment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PythonVirtualenvOperator - Astronomer Registry
Allows one to run a function in a virtualenv that is created and destroyed automatically (with certain caveats). The function must be defined...
Read more >
python - How to use PythonVirtualenvOperator in airflow?
and this airflow is running in a virtual environment (pipenv). The download function is: def download(**kwargs): folder_id = 'xxxxxx-xxxx-xxxx- ...
Read more >
How to use Virtualenv to prepare a separate environment for ...
Use PythonVirtualenvOperator. Now, I can configure the Airflow operator. I pass the required libraries as the requirements parameter.
Read more >
Airflow Docker - ExternalPythonOperator - Python VENV ...
So using virtualenv that's either pre-build in ExternalPythonOperator (in the image of yours) or dynamically created using PythonVirtualenvOperator) is the only ...
Read more >
Understanding Python Operator in Airflow Simplified 101 - Learn
This article will guide you through how to install Apache Airflow in the Python environment to understand different Python Operators used in ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found