Orthogonal Array for Latin hypercube in `scipy.stats.qmc`
See original GitHub issueMotivations
As discussed in #13654 #13647, we need to implement a generator of orthogonal arrays (OAs) to support the orthogonal array based Latin hypercube sampling (LHS), which is a performant version of naive Latin hypercube (scipy.stats.qmc.LatinHypercube
). However, there are so many algorithms to generate OAs that developers are not sure which one to adapt in Scipy at the moment.
In this issue, I would like to have a discussion of the algorithms for generating OAs, which will be used to support OA-based Latin hypercube in Scipy.
Quick Summary of Discussions in #13654 #13647
Orthogonal Arrays based Latin Hypercube sampling(OA-LHS)
Good references for OA-LHS are
- Owen, A. B. (1992). Orthogonal arrays for computer experiments, integration and visualization. Statistica Sinica, 439-452.
- Tang, B. (1993). Orthogonal array-based Latin hypercubes. Journal of the American statistical association, 88(424), 1392-1397.
In those articles, it is assumed that they already have OAs at their hands. By assuming that we have a function get_orthogonal_array
that generates an orthogonal array, the OA-LHS algorithm can be written as follows.
def SampleOALH(n, d):
sample = self.rg_sample((n, d))
# For example,
# symbols = [1, 2, 3, ..., n_grid_size] and
# oa is an array of size (n, d) whose all elements are one of [1, 2, 3, ..., n_grid_size] (=: symbols).
# oa is designed so that its samples (rows) are more uniform (in a sense) than random samples.
oa, symbols = get_orthogonal_array(n, d)
# Randomization of oa[:, :] to generate OA-LHSs
for i in range(d):
# permutate symbols, not the elements
perm = self.rng.shuffle(len(symbols))
oa[:, i] = perm[oa[:i]]
oa_lhs = (oa + sample) / len(symbols)
return oa_lhs
Orthogonal Arrays (OAs)
OAs are briefly introduced in the article above. Also, it has a Wikipedia page on it
The generation of orthogonal arrays is a bit complicated business as there is not a single algorithm that can generate arbitrarily OAs, as discussed below
Additionally, to make matters worse, OAs do not exist for a certain combination of sample size, a dimension of sample space, and strength, which is a parameter of OAs.
Thus, there are also approximation algorithms for generating nearly orthogonal arrays. One promising algorithm would be the following, as it is recommended by the original author in https://github.com/scipy/scipy/issues/13647#issuecomment-792378238
- Owen, A. B. (1994). Controlling correlations in Latin hypercube samples. Journal of the American Statistical Association, 89(428), 1517-1522.
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (12 by maintainers)
Sure, I will let you know. I should also be able to do a PR soon for this OALHS and will ping you. We could add more randomization method if that’s helpful.
No problem. As I said for now it will be very “simple” and we will soon merge. I could not make the famous ESE work, it was not performing any better than doing n-random permutations for me. If I missed something and we can make it work, it can be added later as I left the structure to add other methods.
@kstoneriv3 I had some exchanges with Art and I wrote the following which is successfully producing an orthogonal LHS:
I will clean this snippet and streamline it as some operations can be made more efficient, but the idea is here. It creates an OA of strength 2. With the constraint that
p
must be a prime number. So you cannot freely use any number of samples. It produces a sample of shape (p**2, p+1). What can be done though is to have the dimension you want from 1 to p+1 selecting any subset.