question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`scipy.stats.qmc.LatinHypercube` cannot sample single sample at a time

See original GitHub issue

I am playing around scipy.stats.qmc for hyperparameter optimization and found that scipy.stats.qmc.LatinHypercube cannot sample one sample. Though sampling one sample is an impractical case, I expect that it should not just throw an error in this case.

@tupui I suppose you know a lot about this…

Reproducing code example:

import scipy.stats.qmc

engine = scipy.stats.qmc.LatinHypercube(d=2)
engine.random()  # same as `engine.random(n=1)`

Error message:

Traceback (most recent call last):
  File "b.py", line 4, in <module>
    engine.random(n=1)
  File "/home/kstoneriv3/.local/lib/python3.8/site-packages/scipy-1.7.0.dev0+fc77ea1-py3.8-linux-x86_64.egg/scipy/stats/_qmc.py", line 956, in random
    q = self.rg_integers(low=1, high=n, size=(n, self.d))
  File "_generator.pyx", line 460, in numpy.random._generator.Generator.integers
  File "_bounded_integers.pyx", line 1254, in numpy.random._bounded_integers._rand_int64
ValueError: low >= high

Scipy/Numpy/Python version information:

1.7.0.dev0+fc77ea1 1.19.5 sys.version_info(major=3, minor=8, micro=5, releaselevel=‘final’, serial=0)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:39 (39 by maintainers)

github_iconTop GitHub Comments

2reactions
kstoneriv3commented, Mar 8, 2021

I would go for option 1 as it’s also hard for me to fully understand the algorithm for constructing OAs at the moment. Could you take a look at #13654 then? Its implementation is efficient but the readability is not quite high, I guess.

In addition, I would like to allow n=0 or even d=0(Edit: already OK to use d=0) in LHS, as discussed above.

2reactions
ArtOwencommented, Mar 7, 2021

Hi everybody, I got some emails from tupui about this.

-Art

For LHS you want each variable (each column of the resulting matrix) to have one value in the range [j/n,(j+1)/n) for all 0 <= j < n. You can permute integers 0 to n-1 into a random order and add a U(0,1) to each of them. Do that d times independently to get d columns and then divide the whole thing by n. [Many texts permute 1 to n and then subtract U(0,1). Same distribution.]

Some users will want `centered’ LHS that you can get by adding 0.5 instead of U(0,1).

Orthogonal LHS could mean different things to different people. The ones by Tang are good. It makes a big difference what OAs you use. It makes the most sense to use OAs of strength 2. Then you balance single and double marginal distributions. I like the ones that have n=s^2 where s is a prime number (or any prime power if you have an implementation of Galois fields) and d <= s+1 that I call the Bose construction in my papers. The OA book by Hedayat, Sloane and Stufken calls them something else. There are also arrays with n = 2s^2 for d <= 2s+1, once again with s a prime or prime power. (Warning: the modular arithmetic formulas are quite incorrect if s is not a prime.)

Using an OA of strength t=2, the Tang LHS balances first and second order margins nicely. A scrambled QMC rule will often have better balance because the OA only balances square subregions while the QMC balances many rectangular shapes in two dimensional margins and usually also balances some higher dimensional regions.

I have a paper where I generate LHS with very small correlations among columns that could also be called orthogonal LHS. Owen AB. Controlling correlations in Latin hypercube samples. Journal of the American Statistical Association. 1994 Dec 1;89(428):1517-22. Those along with optimized LHS are good for exploring a function. I’m not aware of studies on how they work for estimating integrals; they might have a subtle bias.

Randomizing an OA and embedding it in the unit cube [0,1]^d is also useful as inputs for visualizing a function. If you do the version adding 1/2 instead of U(0,1) then you get regular grids of points in low dimensional views. Owen AB. Orthogonal arrays for computer experiments, integration and visualization. Statistica Sinica. 1992 Jul 1:439-52.

Something that I think would be very cool for computational purposes is an implementation of randomized Hadamard matrices. Those are n x n binary arrays. The first column is all 1s, so discard that. Now you have n-1 mutually orthogonal binary columns each is half 1s and half -1s. The value of n can be almost any multiple of 4 that you like. The first Paley construction in the Hedayat et al book is available for lots of values of n, and it can be computed for n in the millions or billions because you only need to store O(n) binary variables at a time while delivering n rows of n-1 values. I.e., it can be flow through. You can even access any row you like without computing prior ones. A common strategy for fitting sparse models on n-1 variables is to take a random sample of m rows of the Hadamard matrix. So … Hadamard OAs are among the most important ones.

Read more comments on GitHub >

github_iconTop Results From Across the Web

scipy.stats.qmc.LatinHypercube — SciPy v1.9.3 Manual
A Latin hypercube sample [1] generates n points in [ 0 , 1 ) d . Each univariate marginal distribution is stratified, placing...
Read more >
scipy - How to get the distribution of a parameter using Latin ...
For this purpose, I am trying to generate a Latin Hypercube Sampling of three-dimensional parameter space (namely for a , b , and...
Read more >
scipy/stats/_qmc.py - Fossies
Member "scipy-1.9.3/scipy/stats/_qmc.py" (1 Jan 1970, 83668 Bytes) of ... of sample points in one of those 214 subsets and the volume of that...
Read more >
Quasi-Monte Carlo submodule (scipy.stats.qmc)
Halton sequence. LatinHypercube (d, *[, centered, scramble, ...]) Latin hypercube sampling (LHS) ...
Read more >
Latin Hypercube Sampling and the Propagation of Uncertainty ...
and Latin hypercube sampling, (ii) comparisons of random and Latin hypercube sampling, ... Time-dependent results used to illustrate sensitivity analysis ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found