seasonal forecast valid-time inconsistency
See original GitHub issueHi,
We have been looking at the xarrays and netCDF files produced by cfgrib for the seasonal forecast data. The valid_time recovered using cfgrib is not be exactly consistent with the time stamp recovered by the eccodes grib_to_netcdf method and it may be the case that users are misinterpreting the output they receive.
Using cfgrib, the value of valid_time is the time at the end of the time aggregation window. I think that cfgrib must calculate this as the sum of the time and step. Using ecCodes, the time outputs is at the start of the aggregation window, which for step index n would be the sum of time and step(n-1).
I am bringing this to you attention so that you can decide the best way to deal with it, but it would be very useful (CDS-toolbox) for us to have access to the start of the time aggregation window in the xarray that is recovered. Otherwise we will have to build some sort of work around in the toolbox. One option could be to output the time bounds of the aggregation period, with an additional dimension of length 2.
Additionally, I think that it is probably necessary to include something in the metadata that the valid_time represents the time at the end of the aggregation window as this may not be what users were expecting.
Please let me know if you need any more information about what I am trying to explain. As an example case use the grib file retrieved with the API request below. The time stamps are different when converted using the grib_to_netcdf and cfgrib approaches.
import cdsapi
c = cdsapi.Client()
c.retrieve(
'seasonal-monthly-single-levels',
{
'originating_centre':'ecmwf',
'system':'5',
'variable':'2m_dewpoint_temperature',
'product_type':'monthly_mean',
'year':'2017',
'month':[
'01','02'
],
'leadtime_month':[
'1','2'
],
'format':'grib'
},
'download.grib')
Issue Analytics
- State:
- Created 4 years ago
- Comments:14
Top GitHub Comments
@alexamici I can not more than sympathise with your feeling of confusion as this really highlights how important it is to get this done properly to avoid that even highly skilled users with a more than decent familiarity with the files interpret them wrongly.
I will express a bit more bluntly what I said in my previous comment. It is not that I doubt about it, it is that I am convinced that issue #38 was solved with the feature added in
0.9.7.3
while issue #97 remains unsolved.I’ll try to elaborate on the problem with a couple of additional examples that I hope will showcase more concisely its implications.
If we explore some of the keywords related to time coordinates in a SEAS5 monthly means file:
Note the discrepancy between
verifyingMonth
(the keyword in local definitions table 16 I mentioned in my first comment) andvalidityDate
/validityTime
(the keywords used by cfgrib to populatevalid_time
coordinate, which are not keywords physically stored in the GRIB header but computed keywords). In a more-than-educated guess I am inclined to say the latter are computed as the sum ofdate
/time
+step
If we use
grib_to_netcdf
to convert that same file to netcdf:As you can see,
grib_to_netcdf
is able to understand that the GRIB file has been written using local definitions (specifically table 16), and therefore it populates the time coordinate with the right values, i.e. Nov/2019 is the sensible value (label) forforecastMonth=1
in a monthly means file with nominal start date20191101
and not Dec/2019 as it might be interpreted from the computed valuesvalidityDate
/validityTime
. These last sentences are not a guess, an interpretation or an expression of a personal preference, it is an explanation on how seasonal forecast monthly means GRIB files have been consistently encoded for a long time at ECMWF.Finally, regarding your thinking about where to point within an interval as its label, as you know, the encoding of an aggregation across a coordinate is solved in CF with a proper use of ‘Cell Boundaries’, which unequivocally defines the interval of aggregation. Having that complete definition of the interval, the election of a given value within the interval as its label, it is in my opinion, more a question of uses cases than of personal preferences, and we can easily find reasonable use cases for the beginning, the middle or the end of the interval.
I hope those points contribute to disentangle the confusion and help to find the more suitable solution to this issue.
Hi! I was almost trusting the
valid_time
parameter to actually represent the forecasted month till I was double-checking and found this issue. It is now working like a charm usingverifying_time
andforecastMonth
, so thanks for that! I was wondering if the what and why of this solution is documented somewhere? Then I can avoid the mistake in the future and share it with others. Thanks!