MAGMA 2.8.0
Matrix Algebra for GPU and Multicore Architectures
Loading...
Searching...
No Matches

Functions

magma_int_t magma_cgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_cgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_cgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.
 
magma_int_t magma_cgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by CGBTRF.
 
magma_int_t magma_cgetrf_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaFloatComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 This is an internal routine that might have many assumption.
 
magma_int_t magma_cgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaFloatComplex_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue)
 This is an internal routine.
 
magma_int_t magma_cgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaFloatComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgetrf_vbatched (magma_int_t *m, magma_int_t *n, magmaFloatComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_dgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_dgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.
 
magma_int_t magma_dgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by DGBTRF.
 
magma_int_t magma_dgetrf_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, double **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 This is an internal routine that might have many assumption.
 
magma_int_t magma_dgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaDouble_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue)
 This is an internal routine.
 
magma_int_t magma_dgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, double **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgetrf_vbatched (magma_int_t *m, magma_int_t *n, double **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_sgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_sgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.
 
magma_int_t magma_sgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by SGBTRF.
 
magma_int_t magma_sgetrf_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, float **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 This is an internal routine that might have many assumption.
 
magma_int_t magma_sgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaFloat_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue)
 This is an internal routine.
 
magma_int_t magma_sgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, float **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgetrf_vbatched (magma_int_t *m, magma_int_t *n, float **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbsv_batched_work (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_zgbsv_batched (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_zgbtrf_batched_work (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.
 
magma_int_t magma_zgbtrf_batched (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbtrs_batched (magma_trans_t transA, magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **dipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by ZGBTRF.
 
magma_int_t magma_zgetrf_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgetrf_recpanel_batched (magma_int_t m, magma_int_t n, magma_int_t min_recpnb, magmaDoubleComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t **dipiv_array, magma_int_t **dpivinfo_array, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 This is an internal routine that might have many assumption.
 
magma_int_t magma_zgetrf_recpanel_native (magma_int_t m, magma_int_t n, magma_int_t recnb, magmaDoubleComplex_ptr dA, magma_int_t ldda, magma_int_t *dipiv, magma_int_t *dipivinfo, magma_int_t *dinfo, magma_int_t gbstep, magma_event_t events[2], magma_queue_t queue, magma_queue_t update_queue)
 This is an internal routine.
 
magma_int_t magma_zgetrf_vbatched_max_nocheck_work (magma_int_t *m, magma_int_t *n, magma_int_t max_m, magma_int_t max_n, magma_int_t max_minmn, magma_int_t max_mxn, magmaDoubleComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, void *work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgetrf_vbatched (magma_int_t *m, magma_int_t *n, magmaDoubleComplex **dA_array, magma_int_t *ldda, magma_int_t **dipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magmaFloatComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_cgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaFloatComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_cgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 cgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
 
magma_int_t magma_cgetrf_batched_smallsq_noshfl (magma_int_t n, magmaFloatComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, double **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_dgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, double **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_dgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, double **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 dgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
 
magma_int_t magma_dgetrf_batched_smallsq_noshfl (magma_int_t n, double **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, float **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_sgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, float **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_sgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, float **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 sgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
 
magma_int_t magma_sgetrf_batched_smallsq_noshfl (magma_int_t n, float **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbsv_batched_fused_sm (magma_int_t n, magma_int_t kl, magma_int_t ku, magma_int_t nrhs, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magmaDoubleComplex **dB_array, magma_int_t lddb, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.
 
magma_int_t magma_zgbtrf_batched_fused_sm (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t nthreads, magma_int_t ntcol, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbtrf_batched_sliding_window_loopout (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, void *device_work, magma_int_t *lwork, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgbtrf_batched_sliding_window_loopin (magma_int_t m, magma_int_t n, magma_int_t kl, magma_int_t ku, magmaDoubleComplex **dAB_array, magma_int_t lddab, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.
 
magma_int_t magma_zgetf2_nopiv_internal_batched (magma_int_t m, magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ai, magma_int_t aj, magma_int_t ldda, magma_int_t *info_array, magma_int_t gbstep, magma_int_t batchCount, magma_queue_t queue)
 zgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.
 
magma_int_t magma_zgetrf_batched_smallsq_noshfl (magma_int_t n, magmaDoubleComplex **dA_array, magma_int_t ldda, magma_int_t **ipiv_array, magma_int_t *info_array, magma_int_t batchCount, magma_queue_t queue)
 zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.
 

Detailed Description

Function Documentation

◆ magma_cgbsv_batched_work()

magma_int_t magma_cgbsv_batched_work ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaFloatComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and the solution has not been computed.
[in,out]device_workWorkspace, allocated on device memory.
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbsv_batched()

magma_int_t magma_cgbsv_batched ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaFloatComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbtrf_batched_work()

magma_int_t magma_cgbtrf_batched_work ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.

This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

    *    *    +    +    +       *    *    *   u14  u25  u36
    *    +    +    +    +       *    *   u13  u24  u35  u46
   a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.

Parameters
[in]lddabINTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no factorization is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbtrf_batched()

magma_int_t magma_cgbtrf_batched ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_cgbtrs_batched()

magma_int_t magma_cgbtrs_batched ( magma_trans_t transA,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaFloatComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by CGBTRF.

This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)

Parameters
[in]transAmagma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B)
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_batched()

magma_int_t magma_cgetrf_batched ( magma_int_t m,
magma_int_t n,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_recpanel_batched()

magma_int_t magma_cgetrf_recpanel_batched ( magma_int_t m,
magma_int_t n,
magma_int_t min_recpnb,
magmaFloatComplex ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magma_int_t ** dpivinfo_array,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

CGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]min_recpnbINTEGER. Internal use. The recursive nb
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for A.
[in]ajINTEGER Column offset for A.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dpivinfo_arrayArray of pointers, dimension (batchCount), for internal use.
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_recpanel_native()

magma_int_t magma_cgetrf_recpanel_native ( magma_int_t m,
magma_int_t n,
magma_int_t recnb,
magmaFloatComplex_ptr dA,
magma_int_t ldda,
magma_int_t * dipiv,
magma_int_t * dipivinfo,
magma_int_t * dinfo,
magma_int_t gbstep,
magma_event_t events[2],
magma_queue_t queue,
magma_queue_t update_queue )

This is an internal routine.

CGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a GPU-only routine. The host CPU is not used.

Parameters
[in]mINTEGER The number of rows the matrix A. M >= 0.
[in]nINTEGER The number of columns the matrix A. N >= 0.
[in,out]dAA COMPLEX array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of A. LDDA >= max(1,M).
[out]dipivAn INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dipivinfoAn INTEGER array, for internal use.
[out]dinfoINTEGER, stored on the GPU
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]queuesArray of magma_queue_t, size 2 Queues to execute in.

◆ magma_cgetrf_vbatched_max_nocheck_work()

magma_int_t magma_cgetrf_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
magmaFloatComplex ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_vbatched()

magma_int_t magma_cgetrf_vbatched ( magma_int_t * m,
magma_int_t * n,
magmaFloatComplex ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbsv_batched_work()

magma_int_t magma_dgbsv_batched_work ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
double ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and the solution has not been computed.
[in,out]device_workWorkspace, allocated on device memory.
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbsv_batched()

magma_int_t magma_dgbsv_batched ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
double ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbtrf_batched_work()

magma_int_t magma_dgbtrf_batched_work ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.

This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

    *    *    +    +    +       *    *    *   u14  u25  u36
    *    +    +    +    +       *    *   u13  u24  u35  u46
   a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.

Parameters
[in]lddabINTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no factorization is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbtrf_batched()

magma_int_t magma_dgbtrf_batched ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_dgbtrs_batched()

magma_int_t magma_dgbtrs_batched ( magma_trans_t transA,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
double ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by DGBTRF.

This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)

Parameters
[in]transAmagma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B)
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_batched()

magma_int_t magma_dgetrf_batched ( magma_int_t m,
magma_int_t n,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_recpanel_batched()

magma_int_t magma_dgetrf_recpanel_batched ( magma_int_t m,
magma_int_t n,
magma_int_t min_recpnb,
double ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magma_int_t ** dpivinfo_array,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

DGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]min_recpnbINTEGER. Internal use. The recursive nb
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for A.
[in]ajINTEGER Column offset for A.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dpivinfo_arrayArray of pointers, dimension (batchCount), for internal use.
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_recpanel_native()

magma_int_t magma_dgetrf_recpanel_native ( magma_int_t m,
magma_int_t n,
magma_int_t recnb,
magmaDouble_ptr dA,
magma_int_t ldda,
magma_int_t * dipiv,
magma_int_t * dipivinfo,
magma_int_t * dinfo,
magma_int_t gbstep,
magma_event_t events[2],
magma_queue_t queue,
magma_queue_t update_queue )

This is an internal routine.

DGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a GPU-only routine. The host CPU is not used.

Parameters
[in]mINTEGER The number of rows the matrix A. M >= 0.
[in]nINTEGER The number of columns the matrix A. N >= 0.
[in,out]dAA DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of A. LDDA >= max(1,M).
[out]dipivAn INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dipivinfoAn INTEGER array, for internal use.
[out]dinfoINTEGER, stored on the GPU
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]queuesArray of magma_queue_t, size 2 Queues to execute in.

◆ magma_dgetrf_vbatched_max_nocheck_work()

magma_int_t magma_dgetrf_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
double ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_vbatched()

magma_int_t magma_dgetrf_vbatched ( magma_int_t * m,
magma_int_t * n,
double ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbsv_batched_work()

magma_int_t magma_sgbsv_batched_work ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
float ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and the solution has not been computed.
[in,out]device_workWorkspace, allocated on device memory.
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbsv_batched()

magma_int_t magma_sgbsv_batched ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
float ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbtrf_batched_work()

magma_int_t magma_sgbtrf_batched_work ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a real m-by-n band matrix AB using partial pivoting with row interchanges.

This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

    *    *    +    +    +       *    *    *   u14  u25  u36
    *    +    +    +    +       *    *   u13  u24  u35  u46
   a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.

Parameters
[in]lddabINTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no factorization is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbtrf_batched()

magma_int_t magma_sgbtrf_batched ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_sgbtrs_batched()

magma_int_t magma_sgbtrs_batched ( magma_trans_t transA,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
float ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by SGBTRF.

This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)

Parameters
[in]transAmagma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B)
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_batched()

magma_int_t magma_sgetrf_batched ( magma_int_t m,
magma_int_t n,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_recpanel_batched()

magma_int_t magma_sgetrf_recpanel_batched ( magma_int_t m,
magma_int_t n,
magma_int_t min_recpnb,
float ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magma_int_t ** dpivinfo_array,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

SGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]min_recpnbINTEGER. Internal use. The recursive nb
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for A.
[in]ajINTEGER Column offset for A.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dpivinfo_arrayArray of pointers, dimension (batchCount), for internal use.
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_recpanel_native()

magma_int_t magma_sgetrf_recpanel_native ( magma_int_t m,
magma_int_t n,
magma_int_t recnb,
magmaFloat_ptr dA,
magma_int_t ldda,
magma_int_t * dipiv,
magma_int_t * dipivinfo,
magma_int_t * dinfo,
magma_int_t gbstep,
magma_event_t events[2],
magma_queue_t queue,
magma_queue_t update_queue )

This is an internal routine.

SGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a GPU-only routine. The host CPU is not used.

Parameters
[in]mINTEGER The number of rows the matrix A. M >= 0.
[in]nINTEGER The number of columns the matrix A. N >= 0.
[in,out]dAA REAL array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of A. LDDA >= max(1,M).
[out]dipivAn INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dipivinfoAn INTEGER array, for internal use.
[out]dinfoINTEGER, stored on the GPU
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]queuesArray of magma_queue_t, size 2 Queues to execute in.

◆ magma_sgetrf_vbatched_max_nocheck_work()

magma_int_t magma_sgetrf_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
float ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_vbatched()

magma_int_t magma_sgetrf_vbatched ( magma_int_t * m,
magma_int_t * n,
float ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbsv_batched_work()

magma_int_t magma_zgbsv_batched_work ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaDoubleComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed. > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and the solution has not been computed.
[in,out]device_workWorkspace, allocated on device memory.
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbsv_batched()

magma_int_t magma_zgbsv_batched ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaDoubleComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbtrf_batched_work()

magma_int_t magma_zgbtrf_batched_work ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a complex m-by-n band matrix AB using partial pivoting with row interchanges.

This is a batched version that factors batchCount M-by-N matrices in parallel. dAB, dipiv, and info become arrays with one entry per matrix.

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

    *    *    +    +    +       *    *    *   u14  u25  u36
    *    +    +    +    +       *    *   u13  u24  u35  u46
   a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Note that this behavior is a little different from the standard LAPACK routine. Array elements marked * are not read by the routine, but may be zeroed out after completion. Elements marked + need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDAB,N) On entry, the matrix A in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See above for details about the band storage.

Parameters
[in]lddabINTEGER The leading dimension of each array AB. LDDAB >= (2*KL+KU+1).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no factorization is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbtrf_batched()

magma_int_t magma_zgbtrf_batched ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_zgbtrs_batched()

magma_int_t magma_zgbtrs_batched ( magma_trans_t transA,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magmaDoubleComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRS solves a system of linear equations A * X = B, A**T * X = B, or A**H * X = B with a general band matrix A using the LU factorization computed by ZGBTRF.

This is the batched version of the routine. Currently, only (A * X = B) is supported (no-trans only)

Parameters
[in]transAmagma_trans_t Specifies the form of the system of equations. Currently, only MagnaNoTrans is supported (A*X = B)
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_batched()

magma_int_t magma_zgetrf_batched ( magma_int_t m,
magma_int_t n,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_recpanel_batched()

magma_int_t magma_zgetrf_recpanel_batched ( magma_int_t m,
magma_int_t n,
magma_int_t min_recpnb,
magmaDoubleComplex ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t ** dipiv_array,
magma_int_t ** dpivinfo_array,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

This is an internal routine that might have many assumption.

Documentation is not fully completed

ZGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]mINTEGER The number of rows of each matrix A. M >= 0.
[in]nINTEGER The number of columns of each matrix A. N >= 0.
[in]min_recpnbINTEGER. Internal use. The recursive nb
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for A.
[in]ajINTEGER Column offset for A.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dpivinfo_arrayArray of pointers, dimension (batchCount), for internal use.
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_recpanel_native()

magma_int_t magma_zgetrf_recpanel_native ( magma_int_t m,
magma_int_t n,
magma_int_t recnb,
magmaDoubleComplex_ptr dA,
magma_int_t ldda,
magma_int_t * dipiv,
magma_int_t * dipivinfo,
magma_int_t * dinfo,
magma_int_t gbstep,
magma_event_t events[2],
magma_queue_t queue,
magma_queue_t update_queue )

This is an internal routine.

ZGETRF_PANEL computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a GPU-only routine. The host CPU is not used.

Parameters
[in]mINTEGER The number of rows the matrix A. M >= 0.
[in]nINTEGER The number of columns the matrix A. N >= 0.
[in,out]dAA COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of A. LDDA >= max(1,M).
[out]dipivAn INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dipivinfoAn INTEGER array, for internal use.
[out]dinfoINTEGER, stored on the GPU
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER internal use.
[in]queuesArray of magma_queue_t, size 2 Queues to execute in.

◆ magma_zgetrf_vbatched_max_nocheck_work()

magma_int_t magma_zgetrf_vbatched_max_nocheck_work ( magma_int_t * m,
magma_int_t * n,
magma_int_t max_m,
magma_int_t max_n,
magma_int_t max_minmn,
magma_int_t max_mxn,
magmaDoubleComplex ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
void * work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in]MAX_MINTEGER The maximum number of rows across the batch
[in]MAX_NINTEGER The maximum number of columns across the batch
[in]MAX_MINMNINTEGER The maximum value of min(Mi, Ni) for i = 1, 2, ..., batchCount
[in]MAX_MxNINTEGER The maximum value of the product (Mi x Ni) for i = 1, 2, ..., batchCount
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]WORKVOID pointer A workspace of size LWORK[0]
[in,out]LWORKINTEGER pointer If lwork[0] < 0, a workspace query is assumed, and lwork[0] is overwritten by the required workspace size in bytes. Otherwise, lwork[0] is the size of work
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_vbatched()

magma_int_t magma_zgetrf_vbatched ( magma_int_t * m,
magma_int_t * n,
magmaDoubleComplex ** dA_array,
magma_int_t * ldda,
magma_int_t ** dipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGETRF computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is the variable-size batched version, which factors batchCount matrices of different sizes in parallel. Each matrix is assumed to have its own size and leading dimension.

Parameters
[in]MArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of rows of each matrix A. M[i] >= 0.
[in]NArray of INTEGERs on the GPU, dimension (batchCount) Each is the number of columns of each matrix A. N[i] >= 0.
[in,out]dA_arrayArray of pointers on the GPU, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA[i],N[i]). On entry, each pointer is an M[i]-by-N[i] matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaArray of INTEGERs on the GPU Each is the leading dimension of each array A. LDDA[i] >= max(1,M[i]).
[out]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M[i],N[i])) The pivot indices; for 1 <= p <= min(M[i],N[i]), row p of the matrix was interchanged with row IPIV(p).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbsv_batched_fused_sm()

magma_int_t magma_cgbsv_batched_fused_sm ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magmaFloatComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

CGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by CGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgbtrf_batched_fused_sm()

magma_int_t magma_cgbtrf_batched_fused_sm ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

   *    *    +    +    +       *    *    *   u14  u25  u36
   *    +    +    +    +       *    *   u13  u24  u35  u46
  a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_cgbtrf_batched_sliding_window_loopout()

magma_int_t magma_cgbtrf_batched_sliding_window_loopout ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory by the user
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_cgbtrf_batched_sliding_window_loopin()

magma_int_t magma_cgbtrf_batched_sliding_window_loopin ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaFloatComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

CGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_cgetf2_nopiv_internal_batched()

magma_int_t magma_cgetf2_nopiv_internal_batched ( magma_int_t m,
magma_int_t n,
magmaFloatComplex ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

cgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.

This routine can deal with matrices of limited widths, so it is for internal use.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is a batched version that factors batchCount M-by-N matrices in parallel.

Parameters
[in]mINTEGER The number of rows the matrix A. N >= 0.
[in]nINTEGER The number of columns of the matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for dA_array.
[in]ajINTEGER Column offset for dA_array.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER Internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_cgetrf_batched_smallsq_noshfl()

magma_int_t magma_cgetrf_batched_smallsq_noshfl ( magma_int_t n,
magmaFloatComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

cgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]nINTEGER The size of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbsv_batched_fused_sm()

magma_int_t magma_dgbsv_batched_fused_sm ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
double ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

DGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by DGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgbtrf_batched_fused_sm()

magma_int_t magma_dgbtrf_batched_fused_sm ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

   *    *    +    +    +       *    *    *   u14  u25  u36
   *    +    +    +    +       *    *   u13  u24  u35  u46
  a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_dgbtrf_batched_sliding_window_loopout()

magma_int_t magma_dgbtrf_batched_sliding_window_loopout ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory by the user
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_dgbtrf_batched_sliding_window_loopin()

magma_int_t magma_dgbtrf_batched_sliding_window_loopin ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
double ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

DGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_dgetf2_nopiv_internal_batched()

magma_int_t magma_dgetf2_nopiv_internal_batched ( magma_int_t m,
magma_int_t n,
double ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

dgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.

This routine can deal with matrices of limited widths, so it is for internal use.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is a batched version that factors batchCount M-by-N matrices in parallel.

Parameters
[in]mINTEGER The number of rows the matrix A. N >= 0.
[in]nINTEGER The number of columns of the matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for dA_array.
[in]ajINTEGER Column offset for dA_array.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER Internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_dgetrf_batched_smallsq_noshfl()

magma_int_t magma_dgetrf_batched_smallsq_noshfl ( magma_int_t n,
double ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

dgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]nINTEGER The size of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a DOUBLE PRECISION array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbsv_batched_fused_sm()

magma_int_t magma_sgbsv_batched_fused_sm ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
float ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

SGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by SGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgbtrf_batched_fused_sm()

magma_int_t magma_sgbtrf_batched_fused_sm ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

   *    *    +    +    +       *    *    *   u14  u25  u36
   *    +    +    +    +       *    *   u13  u24  u35  u46
  a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_sgbtrf_batched_sliding_window_loopout()

magma_int_t magma_sgbtrf_batched_sliding_window_loopout ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory by the user
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_sgbtrf_batched_sliding_window_loopin()

magma_int_t magma_sgbtrf_batched_sliding_window_loopin ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
float ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

SGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a REAL array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_sgetf2_nopiv_internal_batched()

magma_int_t magma_sgetf2_nopiv_internal_batched ( magma_int_t m,
magma_int_t n,
float ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

sgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.

This routine can deal with matrices of limited widths, so it is for internal use.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is a batched version that factors batchCount M-by-N matrices in parallel.

Parameters
[in]mINTEGER The number of rows the matrix A. N >= 0.
[in]nINTEGER The number of columns of the matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for dA_array.
[in]ajINTEGER Column offset for dA_array.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER Internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_sgetrf_batched_smallsq_noshfl()

magma_int_t magma_sgetrf_batched_smallsq_noshfl ( magma_int_t n,
float ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

sgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]nINTEGER The size of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a REAL array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbsv_batched_fused_sm()

magma_int_t magma_zgbsv_batched_fused_sm ( magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magma_int_t nrhs,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magmaDoubleComplex ** dB_array,
magma_int_t lddb,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

ZGBSV computes the solution to a system of linear equations A * X = B, where A is a band matrix of order N with KL subdiagonals and KU superdiagonals, and X and B are N-by-NRHS matrices.

The LU decomposition with partial pivoting and row interchanges is used to factor A as A = L * U, where L is a product of permutation and unit lower triangular matrices with KL subdiagonals, and U is upper triangular with KL+KU superdiagonals. The factored form of A is then used to solve the system of equations A * X = B.

This is the batched version of the routine.

Parameters
[in]nINTEGER The order of the matrix A. n >= 0.
[in]klINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]kuINTEGER The number of superdiagonals within the band of A. KL >= 0.
[in]nrhsINTEGER The number of right hand sides, i.e., the number of columns of the matrix B. NRHS >= 0.
[in]dA_arrayArray of pointers, dimension (batchCount). Each contains the details of the LU factorization of the band matrix A, as computed by ZGBTRF. U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= (2*KL+KU+1).
[in]dipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[in,out]dB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX*16 array, dimension (LDB,NRHS) On entry, the right hand side matrix B. On exit, the solution matrix X.
[in]lddbINTEGER The leading dimension of each array B. LDDB >= max(1, N).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgbtrf_batched_fused_sm()

magma_int_t magma_zgbtrf_batched_fused_sm ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t nthreads,
magma_int_t ntcol,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]nthreadsINTEGER The number of threads assigned to a single matrix. nthreads >= (KL+1)
[in]ntcolINTEGER The number of concurrent factorizations in a thread-block ntcol >= 1
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

   *    *    +    +    +       *    *    *   u14  u25  u36
   *    +    +    +    +       *    *   u13  u24  u35  u46
  a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine, but may be set to zero after completion. Elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_zgbtrf_batched_sliding_window_loopout()

magma_int_t magma_zgbtrf_batched_sliding_window_loopout ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
void * device_work,
magma_int_t * lwork,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in,out]device_workWorkspace, allocated on device memory by the user
[in,out]lworkINTEGER pointer The size of the workspace (device_work) in bytes
  • lwork[0] < 0: a workspace query is assumed, the routine calculates the required amount of workspace and returns it in lwork. The workspace is not referenced, and no computation is performed.
  • lwork[0] >= 0: the routine assumes that the user has provided a workspace with the size in lwork.
Parameters
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_zgbtrf_batched_sliding_window_loopin()

magma_int_t magma_zgbtrf_batched_sliding_window_loopin ( magma_int_t m,
magma_int_t n,
magma_int_t kl,
magma_int_t ku,
magmaDoubleComplex ** dAB_array,
magma_int_t lddab,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

ZGBTRF computes an LU factorization of a COMPLEX m-by-n band matrix A using partial pivoting with row interchanges.

This is the batched version of the algorithm, which performs the factorization on a batch of matrices with the same size and lower/upper bandwidths.

This routine has shared memory requirements that may exceed the capacity of the GPU. In such a case, the routine exits immediately, returning a negative error code.

Parameters
[in]MINTEGER The number of rows of the matrix A. M >= 0.
[in]NINTEGER The number of columns of the matrix A. N >= 0.
[in]KLINTEGER The number of subdiagonals within the band of A. KL >= 0.
[in]KUINTEGER The number of superdiagonals within the band of A. KU >= 0.
[in,out]dAB_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array, dimension (LDDAB,N) On entry, the matrix AB in band storage, in rows KL+1 to 2*KL+KU+1; rows 1 to KL of the array need not be set. The j-th column of A is stored in the j-th column of the array AB as follows: AB(kl+ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl)

On exit, details of the factorization: U is stored as an upper triangular band matrix with KL+KU superdiagonals in rows 1 to KL+KU+1, and the multipliers used during the factorization are stored in rows KL+KU+2 to 2*KL+KU+1. See below for further details.

Parameters
[in]LDDABINTEGER The leading dimension of the array AB. LDAB >= 2*KL+KU+1.
[out]dIPIV_arrayArray of pointers, dimension (batchCount). Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]dINFO_arrayINTEGER array, dimension (batchCount) Each is the INFO output for a given matrix = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = +i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

Further Details

The band storage scheme is illustrated by the following example, when M = N = 6, KL = 2, KU = 1:

On entry: On exit:

     *    *    +    +    +       *    *    *   u14  u25  u36
     *    +    +    +    +       *    *   u13  u24  u35  u46
    a12  a23  a34  a45  a56      *   u12  u23  u34  u45  u56

a11 a22 a33 a44 a55 a66 u11 u22 u33 u44 u55 u66 a21 a32 a43 a54 a65 * m21 m32 m43 m54 m65 * a31 a42 a53 a64 * * m31 m42 m53 m64 * *

Array elements marked * are not used by the routine; elements marked

  • need not be set on entry, but are required by the routine to store elements of U because of fill-in resulting from the row interchanges.

◆ magma_zgetf2_nopiv_internal_batched()

magma_int_t magma_zgetf2_nopiv_internal_batched ( magma_int_t m,
magma_int_t n,
magmaDoubleComplex ** dA_array,
magma_int_t ai,
magma_int_t aj,
magma_int_t ldda,
magma_int_t * info_array,
magma_int_t gbstep,
magma_int_t batchCount,
magma_queue_t queue )

zgetf2_nopiv computes the non-pivoting LU factorization of an M-by-N matrix A.

This routine can deal with matrices of limited widths, so it is for internal use.

The factorization has the form A = L * U where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is a batched version that factors batchCount M-by-N matrices in parallel.

Parameters
[in]mINTEGER The number of rows the matrix A. N >= 0.
[in]nINTEGER The number of columns of the matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = L*U; the unit diagonal elements of L are not stored.
[in]aiINTEGER Row offset for dA_array.
[in]ajINTEGER Column offset for dA_array.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]gbstepINTEGER Internal use.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.

◆ magma_zgetrf_batched_smallsq_noshfl()

magma_int_t magma_zgetrf_batched_smallsq_noshfl ( magma_int_t n,
magmaDoubleComplex ** dA_array,
magma_int_t ldda,
magma_int_t ** ipiv_array,
magma_int_t * info_array,
magma_int_t batchCount,
magma_queue_t queue )

zgetrf_batched_smallsq_noshfl computes the LU factorization of a square N-by-N matrix A using partial pivoting with row interchanges.

This routine can deal only with square matrices of size up to 32

The factorization has the form A = P * L * U where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

This is the right-looking Level 3 BLAS version of the algorithm.

This is a batched version that factors batchCount M-by-N matrices in parallel. dA, ipiv, and info become arrays with one entry per matrix.

Parameters
[in]nINTEGER The size of each matrix A. N >= 0.
[in,out]dA_arrayArray of pointers, dimension (batchCount). Each is a COMPLEX_16 array on the GPU, dimension (LDDA,N). On entry, each pointer is an M-by-N matrix to be factored. On exit, the factors L and U from the factorization A = P*L*U; the unit diagonal elements of L are not stored.
[in]lddaINTEGER The leading dimension of each array A. LDDA >= max(1,M).
[out]ipiv_arrayArray of pointers, dimension (batchCount), for corresponding matrices. Each is an INTEGER array, dimension (min(M,N)) The pivot indices; for 1 <= i <= min(M,N), row i of the matrix was interchanged with row IPIV(i).
[out]info_arrayArray of INTEGERs, dimension (batchCount), for corresponding matrices.
  • = 0: successful exit
  • < 0: if INFO = -i, the i-th argument had an illegal value or another error occured, such as memory allocation failed.
  • > 0: if INFO = i, U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations.
[in]batchCountINTEGER The number of matrices to operate on.
[in]queuemagma_queue_t Queue to execute in.