Usage Suggestions
See original GitHub issueJust some thoughts as I was trying it out:
- When calling
cp.ReferenceEnsemble(ds)
, I get
ValueError: Your decadal prediction object must contain the
dimensions `lead` and `init` at the minimum.
Perhaps cp.ReferenceEnsemble(ds)
can accept a lead
and init
keyword that lets the user specify those dimensions without needing to do ds = ds.rename({'target': 'lead', 'initial_time': 'init'})
Something like cp.ReferenceEnsemble(ds, lead='target', init='initial_time')
- When I try
re.add_reference(ref_ds, 'ncep')
wherere = cp.ReferenceEnsemble(ds)
, I get
ValueError: Dimensions must match initialized
prediction ensemble dimensions (excluding `lead` and `member`.)
Maybe if the dimension names are the same, you could try regridding it automatically and maybe raise a warning, or even just use xarray’s built-in interp method? https://xesmf.readthedocs.io/en/latest/
-
Also, I think it would be nice to print out which dimensions are not matching in the error when you do the
set(ref.dims) == set(init_dims)
check because it took me a long time to realize that within that function, you renamed the dimension “init” to “time” even though the input ds initially hadinit
dim so I also renamed myref_ds
toinit
. -
Next, I’m using a dask array; probably related more to xskillscore, but this is the error I encounter.
ValueError: dimension 'time' on 0th function argument to apply_ufunc with dask='parallelized' consists of multiple chunks, but is also a core dimension. To fix, rechunk into a single dask array chunk along this dimension, i.e., ``.rechunk({'time': -1})``, but beware that this may significantly increase memory usage.
- Now that I’m done with all the preprocessing, I’m not sure why I get all NaNs for the first lead.
Anyways, some final thoughts.
-
It’d be nice to be able to compare multiple models against multiple references. My suggestion is just to use the first model to be the base, and all the other models / references datasets have to adhere to the base’s dimensions.
-
It’d also be nice to add support for target time (datetime) instead of integer leads, or leads as Timedelta objects.
-
Before v1 public/pip release, I think docs are really important!
And with that, I’m happy to help with this; just let me know what you would like help on.
Here’s the code that I used.
import os
import xarray as xr
import pandas as pd
import dask.bag as db
import climpred as cp
FCS_MODS = ['CFSv2']# , 'CMC1', 'CMC2', 'GFDL', 'GFDL_FLOR',
# 'NASA_GEOS5v2', 'NCAR_CCSM4']
DT_RANGE = pd.date_range('2019-01-08', '2019-04-08', freq='1M')
BASE_URL = 'https://ftp.cpc.ncep.noaa.gov/NMME/realtime_anom/'
urls = [
BASE_URL + f'{mod}/{dt:%Y%m}0800/{mod}.tmp2m.{dt:%Y%m}.anom.nc'
for mod in FCS_MODS for dt in DT_RANGE
]
# uncomment to download data
# db.from_sequence(urls, npartitions=4).map(lambda url: os.system(f'wget -nc {url}')).compute()
# !wget -nc ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.derived/surface_gauss/air.2m.mon.mean.nc
ds = xr.concat((xr.open_mfdataset(
f'{mod}.*.nc', concat_dim='initial_time', decode_cf=False
).assign(**{'model': mod}) for mod in FCS_MODS), 'model')
ds['initial_time'] = pd.date_range('2019-01', '2019-03', freq='1MS')
ds['target'] = pd.date_range('2019-01', periods=len(ds['target']), freq='1MS')
ds['lead'] = ('target', range(len(ds['target'])))
ds['fcst'] += 273.15
ref_ds = xr.open_dataset('air.2m.mon.mean.nc')
both_time = sorted(list(set(ds['initial_time'].values) & set(ref_ds['time'].values)))
ds = ds.sel(target=both_time).sortby('lat')
ref_ds = ref_ds.sel(time=both_time).sortby('lat')
ref_ds = ref_ds.interp(lat=ds['lat'], lon=ds['lon'])
ref_ds = ref_ds.rename({'air': 'fcst'})
ds = ds.rename({'initial_time': 'init', 'ensmem': 'member'}
).swap_dims({'target': 'lead'}).isel(model=0).load()
re = cp.ReferenceEnsemble(ds)
re.add_reference(ref_ds, 'ncep')
# import hvplot.xarray
# re.compute_metric('ncep', metric='rmse').hvplot('lon', 'lat')
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (2 by maintainers)
Top GitHub Comments
It’s not that there’s NaNs in the grid itself (
rmse
or something like that just puts NaN if there’s a NaN on either thing being compared). It’s that there’s fully blank slices after post-processing.ds.fcst.plot(col='init', row='lead')
Decomposing
compute_reference
to a few simple commands:So computing an RMSE on the above is comparing some NaN slices to data slices and breaks the RMSE.
re.compute_metric(metric='rmse')
plots fine for.isel(lead=1)
and.isel(lead=2)
.Thoughts:
lead
vs.init
vs.time
from @aaronspring and I.Okay, still getting: https://github.com/bradyrx/climpred/issues/112 and 2. https://github.com/bradyrx/climpred/issues/183
Other than that, all good!