question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Orthogonal Array for Latin hypercube in `scipy.stats.qmc`

See original GitHub issue

Motivations

As discussed in #13654 #13647, we need to implement a generator of orthogonal arrays (OAs) to support the orthogonal array based Latin hypercube sampling (LHS), which is a performant version of naive Latin hypercube (scipy.stats.qmc.LatinHypercube). However, there are so many algorithms to generate OAs that developers are not sure which one to adapt in Scipy at the moment.

In this issue, I would like to have a discussion of the algorithms for generating OAs, which will be used to support OA-based Latin hypercube in Scipy.

Quick Summary of Discussions in #13654 #13647

Orthogonal Arrays based Latin Hypercube sampling(OA-LHS)

Good references for OA-LHS are

  • Owen, A. B. (1992). Orthogonal arrays for computer experiments, integration and visualization. Statistica Sinica, 439-452.
  • Tang, B. (1993). Orthogonal array-based Latin hypercubes. Journal of the American statistical association, 88(424), 1392-1397.

In those articles, it is assumed that they already have OAs at their hands. By assuming that we have a function get_orthogonal_array that generates an orthogonal array, the OA-LHS algorithm can be written as follows.

def SampleOALH(n, d):
    sample = self.rg_sample((n, d))
    # For example, 
    # symbols = [1, 2, 3, ..., n_grid_size] and 
    # oa is an array of size (n, d) whose all elements are one of [1, 2, 3, ..., n_grid_size] (=: symbols).
    # oa is designed so that its samples (rows) are more uniform (in a sense) than random samples.
    oa, symbols = get_orthogonal_array(n, d)  
    
    # Randomization of oa[:, :] to generate OA-LHSs
    for i in range(d):
        # permutate symbols, not the elements
        perm = self.rng.shuffle(len(symbols))
        oa[:, i] = perm[oa[:i]]

    oa_lhs = (oa + sample) / len(symbols)
    return oa_lhs

Orthogonal Arrays (OAs)

OAs are briefly introduced in the article above. Also, it has a Wikipedia page on it

The generation of orthogonal arrays is a bit complicated business as there is not a single algorithm that can generate arbitrarily OAs, as discussed below

Additionally, to make matters worse, OAs do not exist for a certain combination of sample size, a dimension of sample space, and strength, which is a parameter of OAs.

Thus, there are also approximation algorithms for generating nearly orthogonal arrays. One promising algorithm would be the following, as it is recommended by the original author in https://github.com/scipy/scipy/issues/13647#issuecomment-792378238

  • Owen, A. B. (1994). Controlling correlations in Latin hypercube samples. Journal of the American Statistical Association, 89(428), 1517-1522.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
tupuicommented, Aug 2, 2021

Then, could you ask Prof Owen about this? Independent random permutation of each column should only produce LHS of strength 1, but not 2. I feel that there are better ways to randomize an OA, such as permutation of columns, shifting points from the origin by a random constant vector, and flipping the sign of randomly picked columns (and taking mod 1.0 afterward). Maybe the scrambling is the smartest of operations of the kind.

Sure, I will let you know. I should also be able to do a PR soon for this OALHS and will ping you. We could add more randomization method if that’s helpful.

Thanks! That makes sense. #13471 seems interesting but I guess I cannot catch up with the thread very soon. I might pop into the thread if I can.

No problem. As I said for now it will be very “simple” and we will soon merge. I could not make the famous ESE work, it was not performing any better than doing n-random permutations for me. If I missed something and we can make it work, it can be added later as I left the structure to add other methods.

1reaction
tupuicommented, Aug 2, 2021

@kstoneriv3 I had some exchanges with Art and I wrote the following which is successfully producing an orthogonal LHS:

import numpy as np
from scipy.stats.qmc import LatinHypercube

p = 5
d = 2  # note that this cannot be changed and is not the dimension of the final sample.

arrays = np.tile(np.arange(p), (2, 1))
oa_sample = np.zeros(shape=(p**2, p+1))
oa_sample[:, :2] = np.stack(np.meshgrid(*arrays), axis=-1).reshape(-1, d)
for p_ in range(1, p):
    oa_sample[:, 2+p_-1] = np.mod(oa_sample[:, 0] + p_*oa_sample[:, 1], p)

for j in range(p+1):
    np.random.shuffle(oa_sample[:, j])

# sample is a randomized OA from now
# and the following is making it an OA-LHS

oa_lhs_sample = np.zeros(shape=(p**2, p+1))
for j in range(p+1):
    for k in range(p):
        idx = np.where(oa_sample[:, j] == k)[0]
        lhs = LatinHypercube(d=1).random(p).flatten()
        oa_lhs_sample[:, j][idx] = lhs + oa_sample[:, j][idx]

oa_lhs_sample /= p

I will clean this snippet and streamline it as some operations can be made more efficient, but the idea is here. It creates an OA of strength 2. With the constraint that p must be a prime number. So you cannot freely use any number of samples. It produces a sample of shape (p**2, p+1). What can be done though is to have the dimension you want from 1 to p+1 selecting any subset.

Read more comments on GitHub >

github_iconTop Results From Across the Web

scipy.stats.qmc.LatinHypercube — SciPy v1.9.3 Manual
Tang, “Orthogonal Array-Based Latin Hypercubes.” Journal of the American Statistical Association, 1993. Examples. Generate samples from a Latin hypercube ...
Read more >
Quasi-Monte Carlo submodule (scipy.stats.qmc)
Latin hypercube sampling (LHS). ... Compared to random points, QMC points are designed to have fewer gaps and clumps. This is quantified by...
Read more >
Randomized Designs — pyDOE 0.3.6 documentation
Latin -Hypercube (lhs)​​ criterion: a string that tells lhs how to sample the points (default: None, which simply randomizes the points within the...
Read more >
scipy/stats/_qmc.py - Fossies
As a special service "Fossies" has tried to format the requested source page into HTML format using (guessed) Python source code syntax highlighting...
Read more >
scipy - How to get the distribution of a parameter using Latin ...
import numpy as np from scipy.stats import qmc sampler = qmc.LatinHypercube(d=3) sample = sampler.random(n=5) l_bounds = [np.log(1e-15), 1, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found