Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Orthogonal Array for Latin hypercube in `scipy.stats.qmc`

See original GitHub issue

Motivations

As discussed in #13654 #13647, we need to implement a generator of orthogonal arrays (OAs) to support the orthogonal array based Latin hypercube sampling (LHS), which is a performant version of naive Latin hypercube (scipy.stats.qmc.LatinHypercube). However, there are so many algorithms to generate OAs that developers are not sure which one to adapt in Scipy at the moment.

In this issue, I would like to have a discussion of the algorithms for generating OAs, which will be used to support OA-based Latin hypercube in Scipy.

Quick Summary of Discussions in #13654 #13647

Orthogonal Arrays based Latin Hypercube sampling(OA-LHS)

Good references for OA-LHS are

Owen, A. B. (1992). Orthogonal arrays for computer experiments, integration and visualization. Statistica Sinica, 439-452.
Tang, B. (1993). Orthogonal array-based Latin hypercubes. Journal of the American statistical association, 88(424), 1392-1397.

In those articles, it is assumed that they already have OAs at their hands. By assuming that we have a function get_orthogonal_array that generates an orthogonal array, the OA-LHS algorithm can be written as follows.

def SampleOALH(n, d):
    sample = self.rg_sample((n, d))
    # For example, 
    # symbols = [1, 2, 3, ..., n_grid_size] and 
    # oa is an array of size (n, d) whose all elements are one of [1, 2, 3, ..., n_grid_size] (=: symbols).
    # oa is designed so that its samples (rows) are more uniform (in a sense) than random samples.
    oa, symbols = get_orthogonal_array(n, d)  
    
    # Randomization of oa[:, :] to generate OA-LHSs
    for i in range(d):
        # permutate symbols, not the elements
        perm = self.rng.shuffle(len(symbols))
        oa[:, i] = perm[oa[:i]]

    oa_lhs = (oa + sample) / len(symbols)
    return oa_lhs

Orthogonal Arrays (OAs)

OAs are briefly introduced in the article above. Also, it has a Wikipedia page on it

https://en.wikipedia.org/wiki/Orthogonal_array

The generation of orthogonal arrays is a bit complicated business as there is not a single algorithm that can generate arbitrarily OAs, as discussed below

https://stackoverflow.com/questions/37851038/how-to-create-orthogonal-array

Additionally, to make matters worse, OAs do not exist for a certain combination of sample size, a dimension of sample space, and strength, which is a parameter of OAs.

Thus, there are also approximation algorithms for generating nearly orthogonal arrays. One promising algorithm would be the following, as it is recommended by the original author in https://github.com/scipy/scipy/issues/13647#issuecomment-792378238

Owen, A. B. (1994). Controlling correlations in Latin hypercube samples. Journal of the American Statistical Association, 89(428), 1517-1522.

Issue Analytics

State:
Created 3 years ago
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

tupuicommented, Aug 2, 2021

Then, could you ask Prof Owen about this? Independent random permutation of each column should only produce LHS of strength 1, but not 2. I feel that there are better ways to randomize an OA, such as permutation of columns, shifting points from the origin by a random constant vector, and flipping the sign of randomly picked columns (and taking mod 1.0 afterward). Maybe the scrambling is the smartest of operations of the kind.

Sure, I will let you know. I should also be able to do a PR soon for this OALHS and will ping you. We could add more randomization method if that’s helpful.

Thanks! That makes sense. #13471 seems interesting but I guess I cannot catch up with the thread very soon. I might pop into the thread if I can.

No problem. As I said for now it will be very “simple” and we will soon merge. I could not make the famous ESE work, it was not performing any better than doing n-random permutations for me. If I missed something and we can make it work, it can be added later as I left the structure to add other methods.

1reaction

tupuicommented, Aug 2, 2021

@kstoneriv3 I had some exchanges with Art and I wrote the following which is successfully producing an orthogonal LHS:

import numpy as np
from scipy.stats.qmc import LatinHypercube

p = 5
d = 2  # note that this cannot be changed and is not the dimension of the final sample.

arrays = np.tile(np.arange(p), (2, 1))
oa_sample = np.zeros(shape=(p**2, p+1))
oa_sample[:, :2] = np.stack(np.meshgrid(*arrays), axis=-1).reshape(-1, d)
for p_ in range(1, p):
    oa_sample[:, 2+p_-1] = np.mod(oa_sample[:, 0] + p_*oa_sample[:, 1], p)

for j in range(p+1):
    np.random.shuffle(oa_sample[:, j])

# sample is a randomized OA from now
# and the following is making it an OA-LHS

oa_lhs_sample = np.zeros(shape=(p**2, p+1))
for j in range(p+1):
    for k in range(p):
        idx = np.where(oa_sample[:, j] == k)[0]
        lhs = LatinHypercube(d=1).random(p).flatten()
        oa_lhs_sample[:, j][idx] = lhs + oa_sample[:, j][idx]

oa_lhs_sample /= p

I will clean this snippet and streamline it as some operations can be made more efficient, but the idea is here. It creates an OA of strength 2. With the constraint that p must be a prime number. So you cannot freely use any number of samples. It produces a sample of shape (p**2, p+1). What can be done though is to have the dimension you want from 1 to p+1 selecting any subset.

Top Results From Across the Web

scipy.stats.qmc.LatinHypercube — SciPy v1.9.3 Manual

Tang, “Orthogonal Array-Based Latin Hypercubes.” Journal of the American Statistical Association, 1993. Examples. Generate samples from a Latin hypercube ...

Quasi-Monte Carlo submodule (scipy.stats.qmc)

Latin hypercube sampling (LHS). ... Compared to random points, QMC points are designed to have fewer gaps and clumps. This is quantified by...

Randomized Designs — pyDOE 0.3.6 documentation

Latin -Hypercube (lhs) criterion: a string that tells lhs how to sample the points (default: None, which simply randomizes the points within the...

scipy/stats/_qmc.py - Fossies

As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) Python source code syntax highlighting...

scipy - How to get the distribution of a parameter using Latin ...

import numpy as np from scipy.stats import qmc sampler = qmc.LatinHypercube(d=3) sample = sampler.random(n=5) l_bounds = [np.log(1e-15), 1, ...