question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

track provenance and the operations recipe

See original GitHub issue

At a workshop of IPCC TG-Data this week, @aspinuso presented on “Data-Intensive and Reproducible Science”, using a pyam tutorial as an example for a workflow with detailed provenance tracking: https://github.com/aspinuso/pyam-binder/blob/master/pyam.ipynb

Looking at the more advanced workflows being discussed for AR6, it might be useful to include some basic support or integration for dispel4py or another package in pyam.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
aspinusocommented, Nov 9, 2019

Dear all

Thanks for the interest.

Just as clarification, the most up-to-date repo of the dispel4py processing library, which includes support for provenance configuration and traceability, is currently developed in the context of the DARE platform and accesible at.

https://gitlab.com/project-dare/dispel4py

Sorry for the confusion. We are in the process of migrating versions and repositories.

I agree that adopting the whole library could be at the moment too complex to address basic traceability needs. Usually libraries are used through dispel4py rather than having it integrated within the library itself. However it could help the realisation of generic and more complex traceable workflows. Especially when these require different analysis libraries, custom metadata and larger computational resources. Let me mention in this thread @rosafilgueira, who is one of the main designer and developer of the tool.

Cheers Alessandro

Op 9 nov. 2019 06:47 schreef Zeb Nicholls notifications@github.com:

Looks very cool. Looks way more complex than we can include and test in our first AR6 workflow draft though haha. If we just want it for reproducibility, I think I’d prefer to make the iiasa-climate-assessment public instead (with appropriate tags) as reproducing from the output of dispel4py looks non-trivial.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/IAMconsortium/pyam/issues/287?email_source=notifications&email_token=AACNZDB6FUAH4445CKVAC3LQS2POVA5CNFSM4JLES6T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDUEG7I#issuecomment-552092541, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACNZDGLTMI7CUYOVBOPA33QS2POVANCNFSM4JLES6TQ.

0reactions
aspinusocommented, Nov 13, 2019

I think it’s important to identify the use cases and the extent of what you want to cover in terms of traceability of the results and the processes involved. If the aim is reproducing and/or tracing (these are two different problems) exclusively what pyam generates, then you should be good enough with combining notebooks and binder repositories with pyam custom lineage. Consider that, if in my binder example I would have used exclusively a pyam-lineage-aware implementation, I would have lost the information about the storage part of the workflow, which includes the location and the ID assigned to the produced image within a repository.

If you want to scale to wider use cases which involves more tools and software libraries, workflow systems are usually a better way for discretising tasks, describe and trace methods. In that case lineage comes usually for free. Have a look also at CWL Tool (https://github.com/common-workflow-language/cwltool) and PROV-ONE (http://jenkins-1.dataone.org/jenkins/view/Documentation Projects/job/ProvONE-Documentation-trunk/ws/provenance/ProvONE/v1/provone.html) as generic tools and models for workflow and provenance description in that context.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Provenance - SLSA.dev
Provenance is a claim that some entity ( builder ) produced one or more software artifacts (Statement's subject ) by executing some recipe...
Read more >
Tracking provenance in a virtual data grid - Clifford - 2008
Tracking provenance in a virtual data grid · 1. INTRODUCTION · 2. THE VIRTUAL DATA SYSTEM · 3. THE CHALLENGE QUERIES · 4....
Read more >
Provenance Management in Curated Databases
ABSTRACT. Curated databases in bioinformatics and other disciplines are the result of a great deal of manual annotation, correction and transfer.
Read more >
iConference2019/README.md at master · LanLi2017 ... - GitHub
CLOPER aims to enhance transparency and reusability of the native OR recipe, which reads in the original "messy" dataset ( d1.csv ) and...
Read more >
Workflows and Provenance: Toward Information Science ...
The goal of recording a cooking recipe may be to assist in the organization ... of the design and operation of scientific workflow...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found