Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sparse SVD cutoff

See original GitHub issue

Hello,

scipy.sparse.linalg.svds introduces a cutoff in singular values. Values smaller than eps*f*largest_singular_value are replaced by zero, where eps is the machine precision value of the datatype of the input and f is 1e3 for single precision and 1e6 for double precision (hence cond = 2.220446049250313e-10 for double and cond = 0.00011920928955078125 for single precision). This is done at https://github.com/scipy/scipy/blob/v1.5.2/scipy/sparse/linalg/eigen/arpack/arpack.py#L1875-L1879.

First problem is this feature is not documented. Second is these values are not that small and are hard-coded: it is not possible to change them. They appear to come from scipy.linalg.pinvh, however there the cutoff is documented and can be specified. There is a tol argument in the API that is used to compute the eigenvalues of A.H @ A but has no effect on the cutoff.

I think tol could also be used as a cutoff for the singular values as well as a parameter for eigsh, or else a cond keyword argument could be added.

Issue Analytics

State:
Created 3 years ago
Comments:25 (14 by maintainers)

Top GitHub Comments

1reaction

evgueni-ovtchinnikovcommented, May 29, 2022

@mgoldeli @lobpcg the time increase is substantial because the matrix Data in Issue_svds.zip is too small - just 4000 by 24.

With such a small size, there is hardly any point in using svds rather that much more robust and accurate svd - on my laptop 1000 svds calls took 2.14 sec and 1000 svd calls 2.7 sec.

Actually, svds becomes much more efficient than svd if min(data.shape) >> k, in which case the extra cost of the new more reliable svds becomes negligible.

0reactions

ogauthecommented, May 30, 2022

For what it’s worth, I have been running a similar post-processing for some time for reasonably large dense matrices. A typical run computes k=100 eigenvectors out of ncv=300 generated Lanczos vectors of a (20k, 20k) matrix with eigsh, then calls svd(A @ eigvec).

Time spent in scipy.linalg.svd is less than 0.2% of the time spent in eigsh, so in such a case the additional cost is negligible.