question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inquiry regarding pipx best practices for Singer.io data platform

See original GitHub issue

Disclaimer: This is more of an inquiry than a feature request.

The singer.io platform uses a number of (primarily) pip installable libraries in order to extract and ingest data. The community in general is using virtual environments but the process is cumbersome. It seems we could encourage the community to switch to pipx and this could make adoption a lot easier for end users, while also just reducing the amount of code and orchestration needed for installs.

The general paradigm is tap-somesource | target-somedestination such as: tap-exchangeratesapi | target-csv. Because each tap and target is it’s own pip-installable executable, it’s common for conflicts to occur amongst taps and targets, or amongst multiple taps which might be installed simultaneously to support a multitude of data sources.

The same install paradigm also carries for orchestrators such as meltano.com and pipelinewise, since these are generally also pip installs.

  • Is there any reason you could foresee why we should not use pipx as a replacement for pip virtualenvs?
  • OR any other warnings/gotchas we should watch out for?

Usage examples:

https://meltano.com/ example:

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install Meltano
pip3 install meltano

https://help.deputy.com/en/articles/3413067-singer-data-tap-build-your-own-data-integration

Step 2 - Create and activate a Python 3 virtual environment for the Tap, which we'll call tap-deputy
Run the following from the command line:

python3 -m venv ~/.virtualenvs/tap-deputy

source ~/.virtualenvs/tap-deputy/bin/activate

Step 3 - Install the Tap using pip
Run the following from the command line:

pip install tap-deputy

More info about the singer platform:

https://www.singer.io/

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:5
  • Comments:14 (3 by maintainers)

github_iconTop GitHub Comments

5reactions
itsayellowcommented, Aug 29, 2020

For #3 see the --suffix option to pipx install https://pipxproject.github.io/pipx/docs/#pipx-install

It’s a new feature, and allows for a suffix to be added to both the executable and the venv. It allows and differentiates between simultaneously-installed different versions of the same package.

5reactions
itsayellowcommented, Aug 28, 2020

If I understand the use case correctly this seems like exactly the type of situation pipx is intended for.

My assumption is that you are only interested in installing utilities that can be run as apps (i.e. executing from the command-line), and they are python-based and pip-installable. If this is the case pipx is perfect.

pipx automates the creation of separate venvs for each separate pip-package. Different packages do not have to worry about overlapping dependencies with different versions, for example.

pipx also puts all the executables in the same directory, so that you only need to add one directory to your PATH (and not the bin directories of multiple venvs).

For your examples, the pipx equivalent would be:

pipx install meltano

and

pipx install tap-deputy

That’s it! Now both meltano and tap-deputy can be run from the command-line if ~/.local/bin is in the user’s path.

I also tested it on my own computer and had no problems.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Inquiry regarding pipx best practices for Singer.io data platform
My assumption is that you are only interested in installing utilities that can be run as apps (i.e. executing from the command-line), and...
Read more >
target-snowflake - PyPI
Singer.io target for loading data into Snowflake. ... Follow the Singer.io Best Practices for setting up separate tap and target virtualenvs to avoid ......
Read more >
Introduction to Data Pipelines with Singer.io | by Pavneet Singh
Data pipelines play a crucial role in all kinds of data platforms, ... This is considered as a best practice when working with...
Read more >
Proceedings of the 21st Python in Science Conference
Design of a Scientific Data Analysis Support Platform ... Second, multiple strategies for annotating large images lever-.
Read more >
Simple index - piwheels
... port-scan sirius-jz pycoupling bgcargodmqc hip-data-tools mp1ampstsdk neronet versioninfo-helper abstractdomain microprobe-all tips-perflib uniborg ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found