## 0.2.8 much slower than 0.2.7

See original GitHub issue### Issue Description

I updated to v.0.2.8 today, and I noticed my code to be much slower than before. This seem to be related to the inclusion of the `lqrt`

test in the results.

- test 1: virtual env with python 3.7.5 pandas 0.24.0 dabest 0.2.7

```
import numpy as np
import pandas as pd
import dabest
np.random.seed(1234)
df = pd.DataFrame({'Group1':np.random.normal(loc=0, size=(1000,)),
'Group2':np.random.normal(loc=1, size=(1000,))})
test = dabest.load(df, idx=['Group1','Group2'])
%time print(test.mean_diff)
```

## DABEST v0.2.7

Good morning! The current time is Tue Dec 31 11:46:00 2019.

The unpaired mean difference between Group1 and Group2 is 1.03 [95%CI 0.941, 1.11]. The two-sided p-value of the Mann-Whitney test is 2.63e-97.

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true.

To get the results of all valid statistical tests, use

`.mean_diff.statistical_tests`

CPU times: user 558 ms, sys: 5.83 ms, total: 564 ms

Wall time: 564 ms

- test 2: virtual env with python 3.7.5 pandas 0.25.3 dabest 0.2.8

```
import numpy as np
import pandas as pd
import dabest
np.random.seed(1234)
df = pd.DataFrame({'Group1':np.random.normal(loc=0, size=(1000,)),
'Group2':np.random.normal(loc=1, size=(1000,))})
test = dabest.load(df, idx=['Group1','Group2'])
%time print(test.mean_diff)
```

## DABEST v0.2.8

Good morning! The current time is Tue Dec 31 11:47:09 2019.

The unpaired mean difference between Group1 and Group2 is 1.03 [95%CI 0.941, 1.11]. The two-sided p-value of the Mann-Whitney test is 2.63e-97.

5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true.

To get the results of all valid statistical tests, use

`.mean_diff.statistical_tests`

CPU times: user 2.46 s, sys: 8.69 ms, total: 2.47 s

Wall time: 2.47 s

Would it be possible to delay doing the statistical tests to when `effect_size.statistical_tests`

is called instead of calculating all the tests a priori?

### Issue Analytics

- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)

## Top GitHub Comments

Okay that sounds okay especially given the fact that there most likely can be some optimizations to be done.

Yes this would be helpful. Can you show me how to do that?

I would imagine perhaps we can even build an API interface for adding any additional “hypothesis test” via a lambda function?

Hi @DizietAsahi , thanks for the suggestion. I’ll see what refactoring needs to be done, otherwise happy to accept a PR from you.

Ideally, the Lq-RT tests should utilise efficient bootstrapping (that is used here); this performance deficit suggests they don’t. I wonder if it might be worth refactoring from the original

`lqrt`

package such that the test results are derived from the bootstraps already generated as part of other calculations…