0.2.8 much slower than 0.2.7
See original GitHub issueI updated to v.0.2.8 today, and I noticed my code to be much slower than before. This seem to be related to the inclusion of the lqrt
test in the results.
- test 1: virtual env with python 3.7.5 pandas 0.24.0 dabest 0.2.7
import numpy as np
import pandas as pd
import dabest
np.random.seed(1234)
df = pd.DataFrame({'Group1':np.random.normal(loc=0, size=(1000,)),
'Group2':np.random.normal(loc=1, size=(1000,))})
test = dabest.load(df, idx=['Group1','Group2'])
%time print(test.mean_diff)
DABEST v0.2.7
Good morning! The current time is Tue Dec 31 11:46:00 2019.
The unpaired mean difference between Group1 and Group2 is 1.03 [95%CI 0.941, 1.11]. The two-sided p-value of the Mann-Whitney test is 2.63e-97.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true.
To get the results of all valid statistical tests, use
.mean_diff.statistical_tests
CPU times: user 558 ms, sys: 5.83 ms, total: 564 ms Wall time: 564 ms
- test 2: virtual env with python 3.7.5 pandas 0.25.3 dabest 0.2.8
import numpy as np
import pandas as pd
import dabest
np.random.seed(1234)
df = pd.DataFrame({'Group1':np.random.normal(loc=0, size=(1000,)),
'Group2':np.random.normal(loc=1, size=(1000,))})
test = dabest.load(df, idx=['Group1','Group2'])
%time print(test.mean_diff)
DABEST v0.2.8
Good morning! The current time is Tue Dec 31 11:47:09 2019.
The unpaired mean difference between Group1 and Group2 is 1.03 [95%CI 0.941, 1.11]. The two-sided p-value of the Mann-Whitney test is 2.63e-97.
5000 bootstrap samples were taken; the confidence interval is bias-corrected and accelerated. The p-value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true.
To get the results of all valid statistical tests, use
.mean_diff.statistical_tests
CPU times: user 2.46 s, sys: 8.69 ms, total: 2.47 s Wall time: 2.47 s
Would it be possible to delay doing the statistical tests to when effect_size.statistical_tests
is called instead of calculating all the tests a priori?
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
Okay that sounds okay especially given the fact that there most likely can be some optimizations to be done.
Yes this would be helpful. Can you show me how to do that?
I would imagine perhaps we can even build an API interface for adding any additional “hypothesis test” via a lambda function?
Hi @DizietAsahi , thanks for the suggestion. I’ll see what refactoring needs to be done, otherwise happy to accept a PR from you.
Ideally, the Lq-RT tests should utilise efficient bootstrapping (that is used here); this performance deficit suggests they don’t. I wonder if it might be worth refactoring from the original
lqrt
package such that the test results are derived from the bootstraps already generated as part of other calculations…