question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

speedup proposal: no lead loop in HindcastEnsemble but lead and init in obs

See original GitHub issue
  • create init and lead dimensions for observation seamlessly with cfsv2_ds["o"] = (( "L", "S"), o.sel(T=T))
  • this approach is thinking more in real_time or valid_time, however, we do not have a dimension for time, but just a multi-dimensional coordinate time. why is this important? if were dimension, shape would get really large and get us faster into memory issues

Demo: https://gist.github.com/aaronspring/9d724cf385c1b29a5288eabf3e55148b

Idea comes from https://github.com/mktippett/ENSO/blob/master/ForecastVerification.ipynb Comparison of this approach with current climpred: 10x less tasks: https://gist.github.com/aaronspring/fa99abbee189d65305179d3344e0a405

Small summary:

  • new verify reproduces verify from climpredv2.1.1 (tiny errors O(10-7))
  • tasks decrease from ~100 to 13-31 (when applying to small data and do.chunk())
  • timing reduces especially for same_verif
  • timing increases for big geospatial data: 1000x1000 gridcells
  • allows lead units from microseconds to years (also now independent of YS also Y or MS/M, QS/S)

Implementation proposal:

  • add multi-dim coord time(init, lead) when instantiating initialized: faciliates PredictionEnsemble.plot() but needs to allow time as coord which is not explicitly not allowed
  • second step: change alignment. Maybe not all the way as proposed here but maybe a combination of the new and old way, e.g. using the new lead time matrix but still looping over each lead?

Benchmark done on my 2018 macbookpro. results in csvfile in gist

Figure summary:

  • Visualization of alignment: here remove observations in the 1990s partly image
  • timing reduces especially for same_verif image
  • timing increases for big geospatial data: 200x200 gridcells image

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
ahuang11commented, Nov 18, 2020

Nevermind; “zero is a zero order spline. It’s value at any point is the last raw value seen.” not what I thought https://stackoverflow.com/questions/27698604/what-do-the-different-values-of-the-kind-argument-mean-in-scipy-interpolate-inte

If I am not mistaken, interp(method=‘zero’) is most robust for except clause if you don’t want it to be interpolated:

import numpy as np
import pandas as pd
import xarray as xr

cfs_url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.NMME/.NCEP-CFSv2/.HINDCAST/.MONTHLY/.T/dods'
obs_url = "http://iridl.ldeo.columbia.edu/expert/SOURCES/.NOAA/.NCEP/.EMC/.CMB/.GLOBAL/.Reyn_SmithOIv2/.monthly/.sst/T/1+index/dods"

obs = xr.open_dataset(obs_url, decode_times=False).isel(T=slice(0, 10))
cfs = xr.open_dataset(cfs_url, decode_times=False)

obs.interp(T=cfs['T'], method='zero')
2reactions
aaronspringcommented, Nov 16, 2020

I see this my way: once people are getting memory issues, they should/need to learn how to use dask to overcome this challenge. if they are way below memory, I dont mind how they do it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

climpred - Read the Docs
hindcast = hindcast.add_observations(obs, 'Obs') print(hindcast). <climpred.HindcastEnsemble>. Initialized Ensemble: SST. (init, lead ...
Read more >
climpred
cal output file could contain the dimensions initialization, lead time, ... We offer HindcastEnsemble and PerfectModelEnsemble objects that carry products ...
Read more >
10 Techniques to Speed Up Python Runtime
1. Proper Data Types Usage in Python · 1.1 Replace list with set to check whether an element is in a sequence ·...
Read more >
CollectionVol_FY09CTBJS.pdf - National Weather Service
1995-2004 for 30 official (OFF) forecasts of ½-month lead precipitation, ... sensitivity to observations as in Langland and Baker (2004), but without using ......
Read more >
institut mpi algorithm: Topics by Science.gov
This leads us to using the Isend/Irecv protocol which will entail ... only speed up the main SCF loop (4x to 6x for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found