Wavelet Log Likelihood (Carter & Winn 2009)
See original GitHub issueIs your feature request related to a problem? Please describe.
I’m using exoplanet
to integrate PyMC3 with efficient transit modeling. I have the base
system running smoothly – thanks to great tutorials. Thank you.
I was requested by a collaborator to include the Carter & Winn (2009) Daubechies wavelet likelihood.
(1) I am able to generate log_likelihood as a float value. So that first feature request
that I would ask is how to add that into the xo.optimize or pm.sample calls?
(2) If we can bypass wavelets altogether by using a GP to estimate sigma_r
and sigma_w
efficiently, then I would be more than happy to do that. We’ve previously discussed using a GP to model the Daubechies wavelet likelihood; but I would need for help in selecting the GP+Kernels combinations to do it.
As a secondary feature request, I would ask for a tutorial to estimate the residual red noise in a light curve the way that our community to put in tables.
Describe the solution you’d like
I see from #70, that adding in a likelihood function is feasible; but I get the impression that there is more to do than a simple xo_logp + wavelet_logp
.
Is there a way to take a function (with 2 inputs + 3 PyM3 parameters) and “add” it to the likelihood computed inside exoplanet?
Describe alternatives you’ve considered
I grabbed the necessary bits of code form @pcubillos’s package mc3
https://github.com/pcubillos/mc3/blob/master/mc3/stats/stats.py
[line 209
]
I can compute the wavelet likelihood (outside of PyMC3+XO) as a float value; but how to wrap this into something that xo
can interpret as an addition to the log-likelihood built in?
Additional context
If there is a much simpler way to use the GP module to estimate sigma_r
and sigma_w
, then I could bypass this whole issue.
At the same time, others may still want a wavelet likelihood added onto their light curve posteriors. As such, may I suggest this feature overall.
Issue Analytics
- State:
- Created 4 years ago
- Comments:15 (5 by maintainers)
Hi Jonathan,
The catch here is that you need to provide the wavelet likelihood as a Theano op and you need to be able to compute the gradient of this operation with respect to its inputs. Here’s how I worked out the gradients for celerite and I bet something similar would be possible (and maybe not even too onerous!) for the C&W method.
But, that being said, I wouldn’t bother! You can use celerite and it’ll be just as fast and have fewer restrictions (it can model more complicated power spectra and it doesn’t require evenly spaced data). If your model is something like:
then
log_s2
is the variance of the white noise component andlog_Sw4
will be related to the “red noise” variance that you want. Honestly it’s not obvious that the sigma_r parameter is a very meaningful number since its value will depend sensitively on the specific choice of power spectrum model, but it should be possible to calibrate a relationship between that number and the parameters of a celerite model if you want. Take a look at Section 4 of the celerite paper to see more about the model specification there.Hope this helps!
@exowanderer
To add to what @dfm said, the wavelet model from the Carter & Winn paper solves for the correlated noise component assuming a specific power-spectrum:
f^{-1}
, aka pink noise. (Their paper in principle allows for other power-laws,f^{-\nu}
, but the fast scaling of the algorithm only works for\nu=1
, if I remember correctly).This power-spectrum is not a great description of stellar variability. It has a scaling at high frequency which causes sharper variations in the correlated noise component that does a stochastically-driven simple damped harmonic oscillator. It is also even stronger at high frequency than a damped random walk (aka Ornstein-Uhlenbeck process). In practice, however, what I expect this means is that the short-timescale variability will be conflated with the white noise component of variability.
Also, at low frequencies
f^{-1}
is too large - in fact, it diverges, which means that there is power on long timescales. In practice this may not be too much of an issue since any dataset has a finite duration. So, I thinkf^{-1}
worked well for Carter & Winn thanks to these saving factors.That said, there may be ranges of frequency where an
f^{-1}
slope is valid, and that can be approximated well with, e.g., threeQ=1/2
celerite terms, as in this plot:Comparing these, you can see that the pink noise continues to rise at low frequencies, and dominates at high frequencies. But, as I mentioned above, a white-noise component will always dominate at the highest frequencies. So, as Dan mentioned, you can approximate this wavelet spectrum over some dynamic range of the power spectrum with the celerite GP.