Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cupy.linalg.inv insists on 2D array

See original GitHub issue

The function cupy.linalg.inv can compute the inverse of a 2D array (matrix), but raises an error for a >2D array. In numpy, the inverse is computed over the last two axes for a >2D array in numpy.linalg.inv.

Can the numpy behavior be implemented in cupy?
Is there an efficient manual method?

Concerning 2, I tried

cp.array([cp.linalg.inv(a) for a in arr], dtype=arr.dtype)

but this seems slow. I also looked at cupy.linalg.tensorinv but I’m not able to use it for this purpose.

Issue Analytics

State:
Created 5 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

2reactions

yoshiponcommented, Sep 29, 2018

@clemisch I think @okuta said that you can do parallelized inv in the same way as Chainer do. Chainer is not needed. Just call cublas APIs (sgetrfBatched and sgetriBatched) as in the code okuta linked.

Thanks, (Note that I’m not a member of PFN, I just concerned the same problem as you.)

1reaction

clemischcommented, Nov 9, 2018

Nevermind, I only was too lazy and just did it myself. You only need to adapt the matmul functions from chainer. But somehow, I get invalid values in the return array. Does the input array b has to have a certain shape?

import numpy
import cupy
from cupy import cuda


def _as_batch_mat(x):
    return x.reshape(len(x), x.shape[1], -1)


def _mat_ptrs(a):
    if len(a) == 1:
        return cupy.full((1,), a.data.ptr, dtype=numpy.uintp)
    else:
        stride = a.strides[0]
        ptr = a.data.ptr
        out = cupy.arange(ptr, ptr + stride * len(a), stride, dtype=numpy.uintp)
        return out


def _get_ld(a):
    strides = a.strides[-2:]
    trans = numpy.argmin(strides)
    return trans, int(max(a.shape[trans - 2], max(strides) // a.itemsize))


def inv_gpu(b):
    # We do a batched LU decomposition on the GPU to compute the inverse
    # Change the shape of the array to be size=1 minibatch if necessary
    # Also copy the matrix as the elments will be modified in-place
    a = _as_batch_mat(b).copy()
    n = a.shape[1]
    n_matrices = len(a)
    # Pivot array
    p = cupy.empty((n, n_matrices), dtype=numpy.int32)
    # Output array
    c = cupy.empty_like(a)
    # These arrays hold information on the execution success
    # or if the matrix was singular
    info = cupy.empty(n_matrices, dtype=numpy.int32)
    ap = _mat_ptrs(a)
    cp = _mat_ptrs(c)
    _, lda = _get_ld(a)
    _, ldc = _get_ld(c)
    handle = cuda.Device().cublas_handle
    cuda.cublas.sgetrfBatched(
        handle, n, ap.data.ptr, lda, p.data.ptr, info.data.ptr, n_matrices)
    cuda.cublas.sgetriBatched(
        handle, n, ap.data.ptr, lda, p.data.ptr, cp.data.ptr, ldc,
        info.data.ptr, n_matrices)
    return c, info

# Testing
a = cupy.array([
    [1, 0, 1], 
    [0, 1, 0], 
    [0, 0, 1]]).astype(cupy.float32)

ai = inv_gpu(a)

print(a)
print(cupy.linalg.inv(a))
print(inv_gpu(a))

Top Results From Across the Web

cupy.linalg.inv — CuPy 11.4.0 documentation

This function computes matrix a_inv from n-dimensional regular matrix a such that dot(a, a_inv) == eye(n) . Parameters. a (cupy.ndarray) – The regular...

cupy.linalg.tensorinv — CuPy 11.3.0 documentation

Computes the inverse of a tensor. This function computes tensor a_inv from tensor a such that tensordot(a_inv, a, ind) == I , where...

Linear algebra (cupy.linalg) — CuPy 11.4.0 documentation

Return the least-squares solution to a linear matrix equation. linalg.inv (a). Computes the inverse of a matrix.

cupy.linalg.pinv — CuPy 11.3.0 documentation

It computes a pseudoinverse of a matrix a , which is a generalization of the inverse matrix with Singular Value Decomposition (SVD). Note...

cupy.linalg.norm — CuPy 11.4.0 documentation

ndarray) – Array to take norm. If axis is None, x must be 1-D or 2-D. ord (non ...