Select Page

# FORMAT

### Edit 2018.10.07

#### Thanks to Jeff Friedman, who found a bug in the code. The original MMWrapper worked well for Matrices A,B IF: Both were square or both were rectangular. If only A was rectangular, then the result was invalid. The MMWrapper since then is based on the matrixMulCUBLAS demo sample from nVidia directly. Down the page is a simple .cpp demo to test the functionality of the wrapper.

##### Original Post Content

he CuBLAS library among other nVdia libraries uses the Column-Major format. This is a problem if you are using the Row-Major format in your application. I Have been over this before in an older post for using the CuFFT in a row-major order environment. Basically, this causes a huge mess in the code, so one has to take extra care of its usage. To use SGEMM and compute C = A*B, we have to reverse the multiplication and compute in fact C = B*A. This alone however wouldn’t work, so we have to lookup the sgemm documentation and adjust parameters:

• m : Number of rows of matrix A and C.            → Number of Columns of matrix B and C.
• n :  Number of columns of matrix B and C.      → Number of Rows of matrix A and C.
• k :  Number of columns of A and rows of B.     → Number of Rows of B and Columns of A.

• lda : Leading dimension of two-dimensional array used to store the matrix A.  →  Is in turn Leading dimension of B.
• ldb : Leading dimension of two-dimensional array used to store the matrix B.  →  Is in turn Leading dimension of A.
• ldc : Leading dimension of two-dimensional array used to store the matrix C.  → Is still a Leading  dimension of C.

So finally we can call:

• cublasSgemm( Blas, CUBLAS_OP_N, CUBLAS_OP_N, m, n, k, &Alpha, dev_B, ldb, dev_A, lda, &Beta, dev_C, ldc );
• *Alfa equals to 1.0f and Beta equals to 0.0f.
• ** A is still A in and B is still B my notation.

If you really dont want to mess anything up with the above, just use a wrapper: