speedup proposal: no lead loop in HindcastEnsemble but lead and init in obs
See original GitHub issue- create init and lead dimensions for
observation
seamlessly withcfsv2_ds["o"] = (( "L", "S"), o.sel(T=T))
- this approach is thinking more in
real_time
orvalid_time
, however, we do not have a dimension for time, but just a multi-dimensional coordinate time. why is this important? if were dimension, shape would get really large and get us faster into memory issues
Demo: https://gist.github.com/aaronspring/9d724cf385c1b29a5288eabf3e55148b
Idea comes from https://github.com/mktippett/ENSO/blob/master/ForecastVerification.ipynb Comparison of this approach with current climpred: 10x less tasks: https://gist.github.com/aaronspring/fa99abbee189d65305179d3344e0a405
Small summary:
new
verify reproducesverify
from climpredv2.1.1 (tiny errors O(10-7))- tasks decrease from ~100 to 13-31 (when applying to small data and do
.chunk()
) - timing reduces especially for
same_verif
- timing increases for big geospatial data: 1000x1000 gridcells
- allows lead units from microseconds to years (also now independent of YS also Y or MS/M, QS/S)
Implementation proposal:
- add multi-dim coord
time(init, lead)
when instantiating initialized: faciliatesPredictionEnsemble.plot()
but needs to allow time as coord which is not explicitly not allowed - second step: change alignment. Maybe not all the way as proposed here but maybe a combination of the new and old way, e.g. using the new lead time matrix but still looping over each lead?
Benchmark done on my 2018 macbookpro. results in csvfile in gist
Figure summary:
- Visualization of alignment: here remove observations in the 1990s partly
- timing reduces especially for
same_verif
- timing increases for big geospatial data: 200x200 gridcells
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (3 by maintainers)
Top Results From Across the Web
climpred - Read the Docs
hindcast = hindcast.add_observations(obs, 'Obs') print(hindcast). <climpred.HindcastEnsemble>. Initialized Ensemble: SST. (init, lead ...
Read more >climpred
cal output file could contain the dimensions initialization, lead time, ... We offer HindcastEnsemble and PerfectModelEnsemble objects that carry products ...
Read more >10 Techniques to Speed Up Python Runtime
1. Proper Data Types Usage in Python · 1.1 Replace list with set to check whether an element is in a sequence ·...
Read more >CollectionVol_FY09CTBJS.pdf - National Weather Service
1995-2004 for 30 official (OFF) forecasts of ½-month lead precipitation, ... sensitivity to observations as in Langland and Baker (2004), but without using ......
Read more >institut mpi algorithm: Topics by Science.gov
This leads us to using the Isend/Irecv protocol which will entail ... only speed up the main SCF loop (4x to 6x for...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Nevermind; “zero is a zero order spline. It’s value at any point is the last raw value seen.” not what I thought https://stackoverflow.com/questions/27698604/what-do-the-different-values-of-the-kind-argument-mean-in-scipy-interpolate-inte
If I am not mistaken, interp(method=‘zero’) is most robust for except clause if you don’t want it to be interpolated:
I see this my way: once people are getting memory issues, they should/need to learn how to use dask to overcome this challenge. if they are way below memory, I dont mind how they do it.