question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Change the cos directory layout for pipeline artifacts

See original GitHub issue

Is your feature request related to a problem? Please describe. Spawned from https://github.com/elyra-ai/elyra/pull/1732

Describe the solution you’d like A clear and concise description of what you want to happen.

Currently Elyra uses the following directory layout for input and output artifacts that are stored in a cloud storage bucket:

<pipeline-name-with-timestamp>/
<pipeline-name-with-timestamp>/<node-1-archive>.tgz    # input artifacts for node 1
<pipeline-name-with-timestamp>/<node-2-archive>.tgz    # input artifacts for node 2
<pipeline-name-with-timestamp>/<output-artifact-1>
<pipeline-name-with-timestamp>/<output-artifact-2>
<pipeline-name-with-timestamp>/<output-artifact-3>
...

There are two issues with this layout:

  • input and output artifacts are stored in the same location
  • output artifacts can be overwritten if the pipeline is run multiple times outside of Elyra (e.g. on KFP or AA)

To resolve the issues we could

  • separate input and output artifacts
  • associate output artifacts with a “run id”

For example:

<pipeline-name-with-timestamp>/
<pipeline-name-with-timestamp>/<node-1-archive>.tgz
<pipeline-name-with-timestamp>/<node-2-archive>.tgz
<pipeline-name-with-timestamp>/<run-1-id>/<output-artifact-1>
<pipeline-name-with-timestamp>/<run-1-id>/<output-artifact-2>
<pipeline-name-with-timestamp>/<run-1-id>/<output-artifact-3>
<pipeline-name-with-timestamp>/<run-2-id>/<output-artifact-1>
<pipeline-name-with-timestamp>/<run-2-id>/<output-artifact-2>
<pipeline-name-with-timestamp>/<run-2-id>/<output-artifact-3>
...

Ideally this unique “run id” is something that’s meaningful to the user.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
ptitzlercommented, Jun 11, 2021

I don’t think we want to entertain having to open the pipeline files to determine their ‘scope’. If we need that reverse lookup (which is natural) this might require more thought.

I agree. We can always improve this incrementally and start by separating input and output artifacts. While such a change would impact the user workflow a bit, it doesn’t have any impact on any of the Elyra tooling/UI. A new release would be good timing to make such a change.

0reactions
kevin-batescommented, Jun 11, 2021

RIght. Perhaps it might be better to treat it purely as a classifier or scope comes to mind. However, I don’t think we want to entertain having to open the pipeline files to determine their ‘scope’. If we need that reverse lookup (which is natural) this might require more thought.

Read more comments on GitHub >

github_iconTop Results From Across the Web

azure-pipelines-yaml/pipeline-artifacts.md at master - GitHub
No change to current behavior. Artifacts are downloaded to $(System.DefaultWorkingDirectory) , which is the sources folder on Build and the artifacts folder on ......
Read more >
Publish and download pipeline Artifacts - Azure
Using Azure Pipelines, you can download artifacts from earlier stages in your pipeline or from another pipeline. You can also publish your ...
Read more >
File pattern for Publish Pipeline Artifact in Azure DevOps
I have 2 archived zip files in artifact staging directory: $(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip; $(Build.
Read more >
Azure DevOps directories and folders Cheat-Sheet - Medium
This directory represents the current Pipeline, it is the default directory a task like Publish Pipeline Artifacts task puts the artifacts to be ......
Read more >
GoCD Configuration Reference - Documentation
The default value is 'artifacts' in the folder where the GoCD Server is ... (Element param) elements to be used in a pipeline...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found