CUBLAS SGEMM DIMENSIONS IN ROW-MAJOR ORDER
FORMAT
Edit 2018.10.07
Thanks to Jeff Friedman, who found a bug in the code. The original MMWrapper worked well for Matrices A,B IF: Both were square or both were rectangular. If only A was rectangular, then the result was invalid. The MMWrapper since then is based on the matrixMulCUBLAS demo sample from nVidia directly. Down the page is a simple .cpp demo to test the functionality of the wrapper.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
/********************************************************************* *@brief Matrix Matrix Multiply Wrapper for Row-Major Data Environment [3,3] * [3,3] = [3,3] [4,3] * [3,4] = [3,3] [4,3] * [4,4] = [4,3] *********************************************************************/ void MMWrapper(cublasHandle_t Blas, float* dev_A, float* dev_B, float *dev_C, int Nx1, int Ny1, int Nx2, int Ny2) { /* * Nx1 ... Number of Columns of A * Ny1 ... Number of Rows of A * Nx2 ... Number of Columns of B * Ny2 ... Number of Rows of B */ float Alpha = 1.0f; float Beta = 0.0f; int OutX = std::min(Nx1, Nx2); // Sgemm computes: C = (Alpha*A) * B + (Beta*C) int m = Nx2; // Number of Columns of matrix B and C. // WIDTH B int n = Ny1; // Number of Rows of matrix A and C. // HEIGH A int k = Nx1; // Number of Columns of A. // WIDTH A int lda = Nx1; // Leading Dimension of A // WIDTH A int ldb = Nx2; // Leading Dimension of B // WIDTH B int ldc = OutX; // Leading Dimension of C // WIDTH C //CUBLAS_OP_N the non - transpose operation is selected //CUBLAS_OP_T the transpose operation is selected //CUBLAS_OP_C the conjugate transpose operation is selected // MULTIPLY IN THE REVERSE ORDER! cublasStatus_t Stat = cublasSgemm(Blas, CUBLAS_OP_N, CUBLAS_OP_N, m, n, k, &Alpha, dev_B, ldb, dev_A, lda, &Beta, dev_C, ldc); //if (Stat != CUBLAS_STATUS_SUCCESS) { // std::cout << "Error Launching MM Multiply!" << std::endl; //} } |
Original Post Content
he CuBLAS library among other nVdia libraries uses the Column-Major format. This is a problem if you are using the Row-Major format in your application. I Have been over this before in an older post for using the CuFFT in a row-major order environment. Basically, this causes a huge mess in the code, so one has to take extra care of its usage. To use SGEMM and compute C = A*B, we have to reverse the multiplication and compute in fact C = B*A. This alone however wouldn’t work, so we have to lookup the sgemm documentation and adjust parameters:
- m : Number of rows of matrix A and C. → Number of Columns of matrix B and C.
- n : Number of columns of matrix B and C. → Number of Rows of matrix A and C.
- k : Number of columns of A and rows of B. → Number of Rows of B and Columns of A.
The leading dimensions:
- lda : Leading dimension of two-dimensional array used to store the matrix A. → Is in turn Leading dimension of B.
- ldb : Leading dimension of two-dimensional array used to store the matrix B. → Is in turn Leading dimension of A.
- ldc : Leading dimension of two-dimensional array used to store the matrix C. → Is still a Leading dimension of C.
So finally we can call:
- cublasSgemm( Blas, CUBLAS_OP_N, CUBLAS_OP_N, m, n, k, &Alpha, dev_B, ldb, dev_A, lda, &Beta, dev_C, ldc );
- *Alfa equals to 1.0f and Beta equals to 0.0f.
- ** A is still A in and B is still B my notation.
If you really dont want to mess anything up with the above, just use a wrapper: