Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal: Release artifact build and import process

See original GitHub issue

Background

There are two models of building content: “push” and “pull”. In a “push” model, the user builds an artifact (e.g., software package, content archive, container image, etc.) locally, and pushes it to a content server. In a “pull” model, the content server downloads or pulls the source code, and builds the artifact for the user. In both models, there are defined procedures, formats, metadata, and supporting tooling to aid in producing a release artifact.

Most popular content services use a “push” model, including: PyPi (Python packages), Crates.io (Rust packages), and NPM (Node.JS packages). For these services, the content creator transforms the source code into a package artifact, and takes on the responsibility of testing, building, and pushing the artifact to the content server.

In rare cases content services take on the process of building artifacts. Docker Hub is one such example, where a content creator is able to configure an automated build process. The build process is triggered by a notification from a source code hosting service (i.e., GitHub or Bitbucket), when new code is merged. In response to the notification, Docker Hub downloads the new code, and generates a new image.

Problem Description

The Galaxy import process works as a “pull” model that can be initiated manually via the Galaxy website, or triggered automatically via a webhook from the Travis CI platform. However, unlike other content services, Galaxy does not enforce an artifact format, does not provide a specification for artifact metadata, and does not provide tooling to aid in building a release artifacts.

When it comes to versioning content, Galaxy relies on git tags stored in the source code hosting service (GitHub). These tags point to a specific commit within the source code history. Each tag represents a point in time within the source code lifecycle, and is only useful within the context of a git repository. Removing the source code from the repository and placing it in an artifact causes the git tags to be lost, and with it any notion of the content version.

Galaxy provides no concept of repository level metadata, where information such as a version number, name and namespace might be located and associated with a release artifact. Metadata is currently only defined at the content level. For example, Ansible roles contain metadata stored in a meta/main.yml file, and modules contain metadata within their source code. Combine multiple content items and types into a single release artifact, and the metadata becomes ambiguous.

The Galaxy import process does not look for a release artifact, but instead clones the GitHub repository, and inspects the local clone. This means that any notion of content version it discovers and records comes directly from git tags. It’s not able to detect when a previously recorded version of the content has been altered, nor is it able to help an end user verify that the content being downloaded is the expected content. It’s also not able to inspect and test release artifacts, and therefore can offer no assurances to the end user of the content.

Since it doesn’t interact with release artifacts, as you might expect, Galaxy offers no prescribed process and procedures for creating a release archive, nor does it offer any tooling to assist in the creation a release archive. The good news is, Galaxy is a blank canvas in this regard.

Proposed Solution

Define repository metadata and build manifest

A repository metadata file, galaxy.toml, will be placed at the root of the project directory tree, and contain information such as: author, license, name, namespace, etc. It will hold any attributes required to create a release artifact from the repository source tree.

The archive build process (defined later) will package the repository source contents (e.g., roles, modules, plugins, etc.), and generate a build manifest file. The generated manifest file will contain the metadata found in galaxy.yml, plus information about the package structure and contents, and information about the release, including the version number.

The generated manifest file will be a JSON formatted file called METADATA that will be added to the root of the release artifact during the build process. Consumers of the release artifact, such as the Galaxy CLI, and the Galaxy import process, will be able to read the manifest file, and verify information about the release and its contents.

Enable Mazer to build packages

Given a defined package structure and a process for building a release artifact, it makes since to build the necessary components into Mazer that automate the artifact build process.

Use GitHub Releases as content storage

GitHub Releases will be the mechanism for storing and sharing release archives. GitHub provides an API that can be used by CI platforms and Mazer to push release artifacts to GitHub.

Mazer will be extended with the ability to push a release artifact to GitHub. This provides a single, consistent method for content creators to automate release pushes that can be called from any CI platform.

Notify the Galaxy server when new release artifacts are available

On the Galaxy server, add the ability for users to generate an API token that can be used by clients, such as Mazer, to authenticate with the API.

Extend Mazer with the ability to trigger an import process. Mazer will authenticate with the API via a user’s API token, and trigger an import of the newly available release.

Verify release artifacts

Enable Mazer to verify the integrity of release artifacts downloaded from GitHub at the time of installation.

There are several solutions widely used for verifying the integrity of a downloaded artifact, including checksums and digital signatures. In general, a checksum guarantees integrity, but not authenticity. A digital signature guarantees both integrity and authenticity.

Using a digital signature for user content requires a complex process of maintaining a trusted keychain, and still does not guarantee perfect authenticity. Since release artifacts are not hosted by Galaxy, but rather by a third party, it’s impossible to perfectly guarantee authenticity.

However, since Galaxy is a centralized packages index, and data transfer between the Galaxy server and client is secured via TLS encryption, Galaxy can be considered a trusted source of metadata, and integrity verification can be achieved by storing release artifact checksums on the Galaxy server.

During import of a repository, Galaxy will store metadata, including the checksum, for a specific content version only once. Any subsequent updates to a version will be prohibited.

Import workflow.

Using Mazer, user triggers an import of a repository, passing the URL of the new release
Galaxy downloads the release artifact, calculates a checksum, and stores the checksum along with additional metadata about the release
Any subsequent updates of already imported package are prohibited.

Install Workflow

User executes mazer install command to install an Ansible collection
Mazer downloads package metadata from Galaxy, which includes the download URL and checksum.
Mazer downloads the release artifact
Mazer calculates checksum of downloaded package, and compares it with checksum received from Galaxy

Issue Analytics

State:
Created 5 years ago
Comments:23 (19 by maintainers)

Top GitHub Comments

6reactions

akaRemcommented, Aug 1, 2018

I have some concerns regarding overall direction of this discussion.

I mostly agree with @cutwater. But anyways I want to add 2c.

Traditional way to organise repos

At the moment, it’s normal to organise repositories in the most simple and reliable way. For example PyPI, NPM, RPM, Maven (and many others) essentially decompose the packages into folders on the file system and create metadata for all packages. Where package is just an archive like tarball file. This way is reliable and does not produce any problems. This scheme works for years. Attempts to do something more optimal or tricky leads to problems like DEB repos have, when repo is inconsistent during updates. Plus all these repo engines usually have some web UI with a search engine.

Here is direction you go in your discussion.

You propose to build a repository on the top of a distributed virtual file storage system, where you are not responsible for either consistency or data accessibility. You have limited ACLs for this storage and you may lost access at any time. You will eventually improve and evolve this system. In some time you’ll find this storage to contain lots of packages in outdated format which are stored over lots of different storage backends like github, gitlab, nexus, custom web servers, amazon, google… This future doesn’t look very cool. It’s much more complicated than DEB repos. Something will definitely go wrong.

You also need to consider these questions:

If I have to check the package before publishing, why can not I do it locally with your utilities, why should I upload it to your server where it will be checked with the very same code I have locally?
How will you determine where to put the package, if I have a fork with a lot of upstreams? Will i be forced to select it or configure?
What If I have private monorepo and I want to publish a couple of my roles?
What if the package has already been published? For example, I published it through on-premise Galaxy in my company, and that Galaxy does not have the same version as yours, so the check-sums do not converge?
What happens if I remove registered package? Will you poll for consistency in background?
What if GitHub is temporarily unavailable? What if Galaxy is temporarily unavailable?
What if I do not have a GitHub account at all?
What should I do if I use Gerrit?
I upload the package to your server, and then you upload this package to my GitHub… Why do I need you in this chain?
How will you store my token? Will you encrypt it in DB? Which encryption method did you chose? What if DB will leak?
if I use Travis to which you refer, it turns out, that if I want to use Travis and Galaxy together, then Travis should have access token to my GitHub account, Travis should have access token to my Galaxy account and Galaxy should have access token to my Github account… And I need to setup all these stuff. It looks like I should configure and upload upload all my tokens everywhere. And we may add more services to the chain or at least add Facebook to be sure that tokens will eventually leak.
I could publish package with Travis like all other artifacts. Why do you force me to use your proxy for these “special” packages?
Wha if you accidentally publish wrong thing into wrong place who will be responsible for that? Who will be responsible for fixing things? Will you delegate it to repo owner? If no, how will you fix it?
Do you really have no resources to make a simple static server? You could use Pulp, It has plugin for Galaxy roles… It’s probably not the very best (note: I didn’t look at it yet), but it’s cheaper to fix and improve that plugin rather then invent your own very cool distributed storage.
When will you roll-out support of Gitlab, BitBucket and their on-premise versions?
And what about Gerrit?
And what about security?

My proposal

Wrap all these things with python setuptools and distribute them as python packages. Use PyPI or your own repo or both. Don’t reinvent the wheel. So many people will be thankful for this simple solution!

1reaction

chouseknechtcommented, Aug 20, 2018

Following up on the discussion with @daviddavis, @bmbouter, @alikins and @cutwater…

We decided the following:

Eliminate the push github command. There’s no need for Mazer to push directly to GitHub.
Mazer will publish to Galaxy. Galaxy will publish the artifact to GitHub, and possibly in the future, store a copy of the archive in Pulp or similar service.
The publish command will perform a multi-part upload of the file to a Galaxy API endpoint
Mazer will provide the ability to override the upload URL, so that archives can be uploaded to any service that can accept multi-part file uploads (e.g., Pulp)

Just to be clear, we’re not forcing contributors to use this process day one. Galaxy will continue to support the existing import process that relies only on GitHub repositories. This new process will be optional. Consider it the first phase in moving Galaxy toward hosting content.

Top Results From Across the Web

Publish and download pipeline Artifacts - Azure

Publish artifacts; Download artifacts; Artifacts in release and deployment jobs; Use Artifacts across stages; Migrate from build artifacts ...

Managing Azure DevOps Artifacts

Artifact Repositories are tracked as follows: Historical Import of Azure DevOps artifacts is accomplished using the App Onboarding. For more ...

Pipeline artifacts - GitLab Docs

Pipeline artifacts are files created by GitLab after a pipeline finishes. Pipeline artifacts are ... Propose functionality by submitting a feature request.

Manage Artifacts and Artifact Packages - TechDocs

As a Release Manager, you configure and manage the deployment process in Release Operations Center. Artifacts are the binaries that are deployed during...

Sharing artifacts | Bamboo Data Center and Server 9.1

Sharing artifacts between build plans · Locate the build plan that you wish to associate an artifact with. Select Actions > Configure plan....