question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Use of Python system executable prevents rules from accessing Python modules in Snakemake conda environments

See original GitHub issue

Description of the problem

Discussion in a PR about how to reference Python executables originally included a plan to standardize on python3 instead of python. Continued discussion led to an alternate solution where the system python would be used when it was defined or python3 would be used. After additional testing, it turns out this solution prevents proper behavior of Snakemake’s conda environments with the --use-conda flag.

When running ncov builds with --use-conda and the system executable, Snakemake activates the rule’s conda environment but the Python modules and executables installed in that environment are not available to the rule. This is because the python executable used in the shell command does not belong to the conda environment but to whatever environment the user executes Snakemake from.

For example, I want to build a subsampled alignment for a county-level ncov build like so:

snakemake \
  --use-conda \
  --profile profiles/king-county \
  results/usa_washington_king-county/subsampled_alignment.fasta

At the “combine and deduplicate” step of the workflow, snakemake activates the conda environment that has nextstrain-augur installed. The environment I’m calling Snakemake from does not have augur installed. The shell command for this rule references the system executable Python, so the command that gets executed is:

/Users/jlhudd/miniconda3/envs/nextstrain/bin/python scripts/combine-and-dedup-fastas.py \
  --input [...snip...] \
  --output [...snip...]

This execution produces the following error:

Traceback (most recent call last):
  File "scripts/combine-and-dedup-fastas.py", line 3, in <module>
    from augur.align import read_sequences
ModuleNotFoundError: No module named 'augur'

However, if I modify the shell command to use python3 , as shown below, everything works as expected.

python3 scripts/combine-and-dedup-fastas.py \
  --input [...snip...] \
  --output [...snip...]

Proposed solution

I propose that we revert back to running Python with the python3 command.

One major reason originally given to not use this approach was that Anaconda installations on Windows do not always symlink a python3 executable. Since we do not officially support augur on Windows anyway, this is does not seem to be a major issue. The inability to use --use-conda is a bigger issue, especially for those of us running builds in environments where we can’t use Docker (e.g., a cluster).

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
tsibleycommented, Jun 30, 2020

Sounds good to me!

0reactions
huddlejcommented, Jul 1, 2020

Cool. I’ll update this once the epic tutorial PR gets merged.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Snakefiles and Rules — Snakemake 7.19.1 documentation
Snakefiles and Rules¶. A Snakemake workflow defines a data analysis in terms of rules that are specified in the Snakefile. Most commonly, rules...
Read more >
Python module not found when submitting jobs to cluster #883
In my case, and I suspect in others' cases, bashrc/bash_profile has already activated a default conda environment which is different than the one...
Read more >
python - environment gets activated but does not execute script
I'm having an issue with using conda environments in rules with snakemake 7.9. It looks like the environment gets activated but the python...
Read more >
Documentation: MultiQC
To fix this, run the command export PYTHONNOUSERSITE=1 before running MultiQC. This variable tells Python not to add site-packages to the system path...
Read more >
Introduction to High-Performance Computing in Python
To access a package, we need to import it. import sys. You'll notice that there's no output. Only one thing is changed: We...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found