PSFPhotometry classes are really slow
See original GitHub issueI’m testing the DAOPhotPSFPhotometry
class with a single iteration, which as far as I understand should equal a single run of the allstar IRAF task.
Even with fixed coordinates (obtained with a previous run of the star finder) in the IntegratedGaussianPRF psf_model
I use, DAOPhotPSFPhotometry
takes around 10 min to complete for ~900 stars in a modest laptop. The allstar
task on the other hand, processing the same number of stars, takes just a few seconds.
Is this expected or is there something wrong with my parameter settings?
Issue Analytics
- State:
- Created 6 years ago
- Comments:25 (21 by maintainers)
Top Results From Across the Web
Point-spread Function Photometry for Impatient Astronomers
Owing to these many parameters, PSF photometry is also much slower to compute than SAP. Some works implement a simplified, less flexible ...
Read more >User-defined PSF models from psf.prepare_psf_model always ...
I want to use a Moffat2D model as my PSF model so I imported that from ... prepare_psf_model to play better the PSF...
Read more >PRFAdapter — photutils 1.6.0 - Read the Docs
This current implementation of this class (using numerical integration for each pixel) is extremely slow, and only suited for experimentation over ...
Read more >Lab 3: Building a Photometric Pipeline - Astro 330 Course Book
But for our purposes, the “slow way” (iterating over the image pixels) takes ~few seconds to run, and is worth doing to build...
Read more >A deconvolution-based algorithm for crowded field photometry ...
field photometry methods is undoubtedly DAOPHOT (Stetson. 1987). In such methods, the PSF is generally determined from ... There exist two classes of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Appears I completely forgot to actually reply to this thread when I was assigned this issue to investigate way back, sorry!
I actually did run some profiling – at least for zeroth order information on where the problem might lie – so I’ve dug out my old scripts, updated a few things, and thought I’d just (finally) write up the results.
I’ve compiled a fairly basic test: generate a few hundred
IntegratedGaussianPRF
sources on a ~1000x1000 pixel array viamake_gaussian_prf_sources_image
, add some background counts, add Poissonian noise withmake_noise_image
, and run a very similar pipeline to that in the documentation:find_peaks
to generate a “real world” input data table,MMMBackground
to get the background counts to setthreshold
, then callDAOPhotPSFPhotometry
(as per the original comment on this issue, although that’s really justIterativelySubtractedPSFPhotometry
; I also keptniter=3
by default, instead of limiting it to a single loop, but I’m not sure if it needs all of them).For no particular reason I’ve run 5 loops of two different versions: one in which both
__init__
and__call__
are run inside the loop, and one where the photometry is initialised outside the loop and just calledNloop
times; this doesn’t seem to make any difference but I’ve just not bothered changing it from when I tested that before, the runtime is basically entirely within the call.I’ve profiled the calls with line_profiler, and attach the results at the bottom of the post.
The end result is that
DAOPhotPSFPhotometry
viaIterativelySubtractedPSFPhotometry
, with noinit_guesses
supplied, subtracts a background and then callsself._do_photometry
. The background subtraction takes ~1% of the (profiled) runtime, so we go a step deeper into_do_photometry
. Here we see two calls toself.finder
(in this instanceDAOStarFinder
), taking 3.5% of the runtime within_do_photometry
between them, a percent runtime forself.group_maker
(DAOGroup
here), and 95% runtime forsuper().nstar
.Once again, following the overwhelming majority of the runtime, we can profile
nstar
(this time fromBasicPSFPhotometry
), and find 2% of THIS runtime (still 94% the total__call__
time) taken byget_grouped_psf_model
, 3% fromself._model_params2table
, 2.5% fromresult_tab = vstack...
, 5% fromunc_tab = vstack...
, and, the two major contributors: 32% fromself.fitter
and 53.5% fromsubtract_psf
.I can’t speak to the fitter, which is the default
LevMarLSQFitter
, but >half the runtime is fromsubtract_psf
. Here, unfortunately, we hit a slightly less obvious culprit;subtract_psf
is split evenly within its call: 2.5% fromsubbeddata = add_array...
4% relative runtime frompsf.copy()
28%indices = np.indices(data.shape)
21.5%subbeddata = data.copy()
21.5% (twice) fory = extract_array...
andx = extract_array...
EDIT: with
scipy.__version__
of1.5.2
theastropy
LevMarLSQFitter
calls spend 81% of their time inscipy.optimize.leastsq
, with 7% in_validate_model
, 2% in_model_to_fit_params
, 3.5% in_fitter_to_model_params
, and 5.5% insum_sqrs = np.sum...
. The trail for that ~third of the runtime goes dead for me there because I don’t particularly understand thescipy
MINPACK
wrappers, and figure the line profiling will break down at this point. I also assume that, being wrappers for some low-level language functions, these calls are fairly well optimised already.So it seems that half of the runtime is from the subtraction of PSFs to form the next iteration’s residual image, but split four ways across four calls.
It’s printed in the script outputs, but for completeness, the versions of a few important packages are:
photutils: 1.1.0.dev127+g2bba1caa astropy: 4.0.2 numpy: 1.19.2
;photutils
is just the master branch up to 1140 being merged.Attached are (in .txt because it wont’ support .py apparently) the script I wrote to generate/fit test data, the output of the line profiling, and the
print
outputs from the script itself to verify the timings and whatnot: psf_photometry_speed.txt speed_results.txt script_outputs.txtAh thanks, I forgot those were astropy functions. Running a quick profile on those two as well:
extract_array
(21.5% x2 ofsubtract_psf
): 12%isscalar(shape)
check 2.5%isscalar(position)
check 2%extracted_array = array_large[large_slices]
1.5%if (extracted_array.shape != shape) and (mode == 'partial'):
78.5%overlap_slices
(perhaps unsurprisingly)add_array
(2.5% runtime): 7% the various checks onlarge_shape
andsmall_shape
within theif
statement 75%overlap_slices
15%array_large[large_slices] += array_small[small_slices]
So those are mostly dominated by the
overlap_slices
calculation.All this said, the way the profiler reports the timings there is loss in the system. For example, in the above
speed_results.txt
,nstar
says its 8870 hits of line 422 are 242 seconds, but the total time we’re actually insubtract_psf
is reported as 172 seconds. That could be overhead that’s actually in the system (i.e., time to enter/exit the function, return the parameters, pass by value, etc.), or overhead that the profiler is introducing (although it claims it subtracts its own overhead off the reported numbers, iirc).This didn’t really matter in the above comment, as those runtimes were sufficiently large that the time spent in the function was the largest term (although 172 vs 242 is only 71% of the total time to call
image = subtract_psf(image, self.psf_model, param_table, subshape=self.fitshape)
…).However, for the extra profiling I ran just now (for 200 sources, not 900), the profiler reports that
subtract_psf
spends 16 seconds calling the twoextract_array
functions, and 1.02 seconds callingadd_array
, but then reports that it is actually inextract_array
for 0.63s, and inadd_array
for 0.24s! So actually it seems thatoverlap_slices
(which has total of 0.7s runtime) is NOT the important part, but something within the function calls for those two lines. (This profiling can be seen in speed_results2.txt)In full, those are:
subbeddata = add_array(subbeddata, -psf(x, y), (y_0, x_0))
andy = extract_array(indices[0].astype(float), subshape, (y_0, x_0))
x = extract_array(indices[1].astype(float), subshape, (y_0, x_0))
Figuring the only actual code being run in the call is the
astype(float)
conversion I moved that out to its own line, addinginds_float = indices.astype(float)
and then callingextract_array
withinds_float[1]
. As can be seen in speed_results3.txt, this moves 39% of the run time to the float conversion, which combines withindices = np.indices(data.shape)
to 70% ofsubtract_psf
’s runtime – with 22% left to thedata.copy()
a line below it.Thus if something can be done to remove any, or all, of
in
subtract_psf
the PSF fitting will get a big speed up. It can’t get much more than 3x faster unfortunately without improving the speed of the fitter we’re using by default, but on my 2019 macbook pro 13" I was fitting 900 sources in ~50 seconds so hopefully we’re already well optimised from the 10 minutes reported previously.