Independence tests for Chain Ladder
See original GitHub issueIt is important that some critical assumptions around chain ladder are tested before applying the method. Thomas Mack suggested some tests, for example in “Measuring the variability of Chain Ladder reserve estimates”, 1997. Below are implementations for tests of correlation among development factors and for the impact of calendar years. I don’t feel confident enough to modify the package code directly but hopefully this can be a useful template.
def developFactorsCorrelation(df):
# Mack (1997) test for Correlations between Subsequent Development Factors
# results should be between -.67x and +.67x stdError otherwise too much correlation
m1=df.rank() # rank the development factors by column
m2=df.to_numpy(copy=True) #does the same but ignoring the anti-diagonal
np.fill_diagonal(np.fliplr(m2),np.nan)
m2=pd.DataFrame(np.roll(m2,1),columns=m1.columns, index=m1.index).iloc[:,1:] #leave out the first column
m2=m2.rank()
numerator=((m1-m2) **2).sum(axis=0)
SpearmanFactor=pd.DataFrame(range(1,len(m1.columns)+1),index=m1.columns, columns=['colNo'])
I = SpearmanFactor['colNo'].max()+1
SpearmanFactor['divisor'] = (I-SpearmanFactor['colNo'])**3 - I +SpearmanFactor['colNo'] #(I-k)^3-I+k
SpearmanFactor['value']= 1-6*numerator.T/SpearmanFactor['divisor']
SpearmanFactor['weighted'] = SpearmanFactor['value'] * (I-SpearmanFactor['colNo']-1) / (SpearmanFactor[1:-1]['colNo']-1).sum() #weight sum excludes 1 and I
SpearmanCorr=SpearmanFactor['weighted'].iloc[1:-1].sum() # exlcuding 1st and last elements as not significant
SpearmanCorrVar = 2/((I-2)*(I-3))
return SpearmanCorr,SpearmanCorrVar
from scipy.stats import binom
def calendarCorrelation(df, pCritical=.1):
# Mack (1997) test for calendar year effect
# A calendar period has impact across developments if the probability of the number of small (or large)
# development factors in that period occurring randomly is less than pCritical
# df should have period as the row index, on the assumption that the first anti-diagonal is in relation to the same period (development=0)
m1=df.rank() # rank the development factors by column
med=m1.median(axis=0) # find the median value for each column
m1large=m1.apply(lambda r: r>med, axis=1) # sets to True those elements in each column which are large (above the median rank)
m1small=m1.apply(lambda r: r<med, axis=1)
m2large=m1large.to_numpy(copy=True)
m2small=m1small.to_numpy(copy=True)
S=[np.diag(m2small[:,::-1],k).sum() for k in range(min(m2small.shape),-1,-1)] # number of large elements in anti-diagonal (calendar year)
L=[np.diag(m2large[:,::-1],k).sum() for k in range(min(m2large.shape),-1,-1)] # number of large elements in anti-diagonal (calendar year)
probs=[binom.pmf(S[i], S[i]+L[i], 0.5)+binom.pmf(L[i], S[i]+L[i], 0.5) for i in range(len(S))] # probability of NOT having too many large or small items in anti-diagonal (calendar year)
return pd.Series([p<pCritical for p in probs[1:]], index=df.index)
And using the example in the paper
MackEx='''1,2,3,4,5,6,7,8,9
1.6,1.32,1.08,1.15,1.2,1.1,1.033,1,1.01
40.4,1.26,1.98,1.29,1.13,.99,1.043,1.03,
2.6,1.54,1.16,1.16,1.19,1.03,1.026,,
2,1.36,1.35,1.1,1.11,1.04,,,
8.8,1.66,1.4,1.17,1.01,,,,
4.3,1.82,1.11,1.23,,,,,
7.2,2.72,1.12,,,,,,
5.1,1.89,,,,,,,
1.7,,,,,,,,
'''
df=pd.read_csv(StringIO(MackEx), header=0)
df.index = df.index+1 #reindex rows from 1
dfCorr,dfCorrVar = developFactorsCorrelation(df)
print('Development factors correlation is {:.2%}'.format(dfCorr))
print('Factor independence if correlation is in range [{:.2%} to {:.2%}]'.format(-.67*np.sqrt(dfCorrVar), .67*np.sqrt(dfCorrVar)))
print('Dependence on calendar year:')
print(calendarCorrelation(df))
Result is
Development factors correlation is 6.96%
Factor independence if correlation is in range [-12.66% to 12.66%]
Dependence on calendar year:
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
dtype: bool
Issue Analytics
- State:
- Created 3 years ago
- Comments:26 (13 by maintainers)
Top Results From Across the Web
TESTING THE ASSUMPTIONS OF AGE-TO-AGE FACTORS
This is a test failed by many development triangles, which means that the chain ladder method is not optimal for those triangles. The...
Read more >Development Tutorial — Reserving in Python
The chain ladder method is based on the strong assumptions of independence across origin periods and across valuation periods. Mack developed tests to ......
Read more >cyEffTest: Testing for Calendar Year Effect in ChainLadder
One of the three basic assumptions underlying the chain ladder method is the independence of the accident years. The function tests this assumption....
Read more >46 CFR § 160.017-27 - Production tests and examination.
... 160.017 - Chain Ladder; § 160.017-27 Production tests and examination. ... Each production test must be conducted or supervised by an independent...
Read more >Development tutorial - Jupyter Notebooks Gallery
The Chain Ladder method is based on the strong assumptions of independence across origin years and across valuation years. Mack developed tests to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Great, so this package leaps-frog R 😃
In respect of my previous
The answer is probably as simple as
which gives out very quickly an array with same shape as n and z
Thanks for the latest fixes @gig67. I think this one is done now. I’ll push a new release to pypi to make the enhancement available in the official package.