Use of Python system executable prevents rules from accessing Python modules in Snakemake conda environments
See original GitHub issueDescription of the problem
Discussion in a PR about how to reference Python executables originally included a plan to standardize on python3
instead of python
. Continued discussion led to an alternate solution where the system python
would be used when it was defined or python3
would be used. After additional testing, it turns out this solution prevents proper behavior of Snakemake’s conda environments with the --use-conda
flag.
When running ncov builds with --use-conda
and the system executable, Snakemake activates the rule’s conda environment but the Python modules and executables installed in that environment are not available to the rule. This is because the python
executable used in the shell command does not belong to the conda environment but to whatever environment the user executes Snakemake from.
For example, I want to build a subsampled alignment for a county-level ncov build like so:
snakemake \
--use-conda \
--profile profiles/king-county \
results/usa_washington_king-county/subsampled_alignment.fasta
At the “combine and deduplicate” step of the workflow, snakemake activates the conda environment that has nextstrain-augur
installed. The environment I’m calling Snakemake from does not have augur installed. The shell command for this rule references the system executable Python, so the command that gets executed is:
/Users/jlhudd/miniconda3/envs/nextstrain/bin/python scripts/combine-and-dedup-fastas.py \
--input [...snip...] \
--output [...snip...]
This execution produces the following error:
Traceback (most recent call last):
File "scripts/combine-and-dedup-fastas.py", line 3, in <module>
from augur.align import read_sequences
ModuleNotFoundError: No module named 'augur'
However, if I modify the shell command to use python3
, as shown below, everything works as expected.
python3 scripts/combine-and-dedup-fastas.py \
--input [...snip...] \
--output [...snip...]
Proposed solution
I propose that we revert back to running Python with the python3
command.
One major reason originally given to not use this approach was that Anaconda installations on Windows do not always symlink a python3
executable. Since we do not officially support augur on Windows anyway, this is does not seem to be a major issue. The inability to use --use-conda
is a bigger issue, especially for those of us running builds in environments where we can’t use Docker (e.g., a cluster).
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Sounds good to me!
Cool. I’ll update this once the epic
tutorial
PR gets merged.