Functional composition within astropy, i.e., pipelining or grouping functional calls to make them reusable.
See original GitHub issueDescription
One of the most common uses of astropy is to take some input arguments/data and apply a bunch of functions to it. For a simple example, let’s consider taking an input array of of floats Z representing redshifts:
- passing it through a cosmology function to get the distance modulus
- applying some statistical clean-up method, say sigma clipping.
- fitting a model to it such as a Gaussian1D
- using visualization or stats to get a meaningful value out of this gaussian fit, or possibly use the output as input into a custom function that produces something of value.
In pseudo-code it looks like this:
Z = Table.read('redshifts.ecsv')
dm = Cosmology.distmod(Z)
dm_clipped = SigmaClip(dm, sigma=5)
g1d = Gaussian1D()
results, result_info = fitter.fit(g1d, Z)
custom_plot_function(results, result_info)
out = custom_report_function(results, result_info)
...
>>> do something with out
...
However, functional composition could be used to pipeline-ifyTM this process and lose the intermediate variables cluttering up the namespace. While it is possible to custom implement this using functools as an example, packages like scikitlearn (Pipeline), pytorch (Sequential), etc. provide inbuilt classes that can take a random set of functions, and apply them sequentially (non-sequential operation using a graph is also possible), and give an output in a clean format. This sort of formalism is useful from a better software design point of view. In addition to not creating intermediate variables, it allows easily reusing the now “pipelined” set of functions without having to copy paste code, with possibly now duplicated variable names, and makes it easier to bug fix, refactor, or maintain code.
In this case the above might be transformed into
pipeline = astropy.Pipeline(Table.read, Cosmology.distmod, SigmaClip, fitter.fit, custom_plot_function, custom_report_function)
pipeline('redshift.ecsv')
Now, the most common issue would be passing in arguments. There are many ways to handle this. There could be args, kwargs parameters added to Pipeline class. You could use custom decorators, or provide some ready made decorators, that apply their arguments to the functions, etc.
One example using ready made decorators:
...distmod, kwargs_decorator(SigmaClip, {'sigma':5}), args_decorator(fitter.fit, g1d, set_par_order=(1,0)), ...
where kwargs_decorator applies keywords arguments to SigmaClip, and args_decorator applies non-keyword args and flips the order. By default, the first returned object would be passed down the chain to each function as the first argument, but not all astropy functions may make sense with that.
So, the proposal is simple, but the implementation may not be. Create a class that can be used to make a pipeline out of a set of functions, thus eliminating the need to save intermediate variables, which is bad practice. The general idea of this is already implemented into base python and afaik is called functional composition. The tools within functools can be used to make this happen. A benefit of providing it with astropy is that we can add helper methods that take care of commonly encountered issues that are particular to various astropy packages/methods. We also save people from having to try to create such a class each time themselves.
Additional context
Here is a basic application example I found on Youtube https://youtu.be/ka70COItN40?t=1410, the video also has a specific example using the Pytorch Sequential implementation.
It is possible to further boost this Pipeline Class with functionality that is more complicated. For example, the additional parameters passed or the functional chain that is used can be changed based on the output of any member of the chain. Thinking of it in terms of a graph of flow-chart, that becomes a Pipeline instance. This would make many astropy codes infinitely reusable, so probably also has benefits when sharing code.
I would be happy to contribute a basic linear/sequential pipeline class example, if I could be pointed in the right direction for where it would go (an affiliate package, as a core part of astropy, where?) and after receiving some input on what those who are much more familiar with astropy think is the best way to support passing in arguments to function. The non-linear implementation could be left up to a future improvement.
Please let me know if this already exists within astropy (I don’t think so), and whether I am missing something obvious that makes it hard to create such a generic helper.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (11 by maintainers)
Top GitHub Comments
Great find! Looks very useful. According to the docs, functions don’t have to be decorated but “MetaFunctions are also capable of upgrading regular functions to MetaFunctions at composition time” - they show an example of using
input
andprint
from the standard library which are certainly not decorated as a meta function within the standard library.So, where do we go from here? It seems that you have a solution - in fact you have a solution that might solve everyones desires fur such a feature. To me that sounds as if we don’t need it in astropy itself and we can close this issue.
However, people rarely read closed issues, so maybe we should turn that into an example for learn.astropy.org as “Other packages you might want to use together with astropy” or “How to use Astropy in a pipeline”?
When I encounter this, I usually write a function, following the example above:
I don’t think I understand what this proposal suggests to do that is astropy specific. There are a number of Python packages out there at do similar things already (e.g. the above referenced scikitlearn and pytorch implementations, but also general purpose Python packages (the first hit in my search was this https://pythonawesome.com/a-lightweight-python-pipeline-framework/ but I know there are several others).
It might be worth spending some time looking for a general purpose application that provides this functionality already and then write a documentation example how to use it with astropy, rather than implementing new code.