question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Functional composition within astropy, i.e., pipelining or grouping functional calls to make them reusable.

See original GitHub issue

Description

One of the most common uses of astropy is to take some input arguments/data and apply a bunch of functions to it. For a simple example, let’s consider taking an input array of of floats Z representing redshifts:

  • passing it through a cosmology function to get the distance modulus
  • applying some statistical clean-up method, say sigma clipping.
  • fitting a model to it such as a Gaussian1D
  • using visualization or stats to get a meaningful value out of this gaussian fit, or possibly use the output as input into a custom function that produces something of value.

In pseudo-code it looks like this:

Z = Table.read('redshifts.ecsv')
dm = Cosmology.distmod(Z)
dm_clipped = SigmaClip(dm, sigma=5)
g1d = Gaussian1D()
results, result_info  = fitter.fit(g1d, Z)
custom_plot_function(results, result_info)
out = custom_report_function(results, result_info)
...
 >>> do something with out
...

However, functional composition could be used to pipeline-ifyTM this process and lose the intermediate variables cluttering up the namespace. While it is possible to custom implement this using functools as an example, packages like scikitlearn (Pipeline), pytorch (Sequential), etc. provide inbuilt classes that can take a random set of functions, and apply them sequentially (non-sequential operation using a graph is also possible), and give an output in a clean format. This sort of formalism is useful from a better software design point of view. In addition to not creating intermediate variables, it allows easily reusing the now “pipelined” set of functions without having to copy paste code, with possibly now duplicated variable names, and makes it easier to bug fix, refactor, or maintain code.

In this case the above might be transformed into

pipeline = astropy.Pipeline(Table.read, Cosmology.distmod, SigmaClip, fitter.fit, custom_plot_function, custom_report_function)
pipeline('redshift.ecsv')

Now, the most common issue would be passing in arguments. There are many ways to handle this. There could be args, kwargs parameters added to Pipeline class. You could use custom decorators, or provide some ready made decorators, that apply their arguments to the functions, etc.

One example using ready made decorators:

...distmod, kwargs_decorator(SigmaClip, {'sigma':5}), args_decorator(fitter.fit, g1d, set_par_order=(1,0)), ...

where kwargs_decorator applies keywords arguments to SigmaClip, and args_decorator applies non-keyword args and flips the order. By default, the first returned object would be passed down the chain to each function as the first argument, but not all astropy functions may make sense with that.

So, the proposal is simple, but the implementation may not be. Create a class that can be used to make a pipeline out of a set of functions, thus eliminating the need to save intermediate variables, which is bad practice. The general idea of this is already implemented into base python and afaik is called functional composition. The tools within functools can be used to make this happen. A benefit of providing it with astropy is that we can add helper methods that take care of commonly encountered issues that are particular to various astropy packages/methods. We also save people from having to try to create such a class each time themselves.

Additional context

Here is a basic application example I found on Youtube https://youtu.be/ka70COItN40?t=1410, the video also has a specific example using the Pytorch Sequential implementation.

It is possible to further boost this Pipeline Class with functionality that is more complicated. For example, the additional parameters passed or the functional chain that is used can be changed based on the output of any member of the chain. Thinking of it in terms of a graph of flow-chart, that becomes a Pipeline instance. This would make many astropy codes infinitely reusable, so probably also has benefits when sharing code.

I would be happy to contribute a basic linear/sequential pipeline class example, if I could be pointed in the right direction for where it would go (an affiliate package, as a core part of astropy, where?) and after receiving some input on what those who are much more familiar with astropy think is the best way to support passing in arguments to function. The non-linear implementation could be left up to a future improvement.

Please let me know if this already exists within astropy (I don’t think so), and whether I am missing something obvious that makes it hard to create such a generic helper.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
hamogucommented, Dec 20, 2021

Great find! Looks very useful. According to the docs, functions don’t have to be decorated but “MetaFunctions are also capable of upgrading regular functions to MetaFunctions at composition time” - they show an example of using input and print from the standard library which are certainly not decorated as a meta function within the standard library.

So, where do we go from here? It seems that you have a solution - in fact you have a solution that might solve everyones desires fur such a feature. To me that sounds as if we don’t need it in astropy itself and we can close this issue.

However, people rarely read closed issues, so maybe we should turn that into an example for learn.astropy.org as “Other packages you might want to use together with astropy” or “How to use Astropy in a pipeline”?

1reaction
hamogucommented, Oct 15, 2021

When I encounter this, I usually write a function, following the example above:

def redshifttable_to_out(filename):
    Z = Table.read(filename)
    dm = Cosmology.distmod(Z)
    dm_clipped = SigmaClip(dm, sigma=5)
    g1d = Gaussian1D()
    results, result_info  = fitter.fit(g1d, Z)
    custom_plot_function(results, result_info)
    return custom_report_function(results, result_info)

I don’t think I understand what this proposal suggests to do that is astropy specific. There are a number of Python packages out there at do similar things already (e.g. the above referenced scikitlearn and pytorch implementations, but also general purpose Python packages (the first hit in my search was this https://pythonawesome.com/a-lightweight-python-pipeline-framework/ but I know there are several others).

It might be worth spending some time looking for a general purpose application that provides this functionality already and then write a documentation example how to use it with astropy, rather than implementing new code.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Building an Open-science Project and Status of the v2.0 Core ...
The development of the astropy core package began as a largely community-driven effort to standardize core functionality for astronomical software in Python ...
Read more >
Astropy Core Package Utilities (astropy.utils) — Astropy v5.2
utils package contains general-purpose utility functions and classes. Examples include data structures, tools for downloading and caching from URLs, and version ...
Read more >
The Astropy Project: Building an Open-science Project and ...
Find, read and cite all the research you need on ResearchGate. ... function of time since the genesis of the astropy core. package....
Read more >
Reuse code with domain-specific steps in collection pipelines
ImmutableJs provides a function called update that can be used to achieve the same effect. It gets the whole collection and returns a...
Read more >
ASCL.net - Browsing Codes - Astrophysics Source Code Library
PySe can be integrated into a pipeline; it was originally written as part of ... be installed with a single function call and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found