Select Page T

oday,I decided to show you how to implement a Gaussian blur and further optimize it to run more effectively using nVidia CUDA.

At first, you might have wondered how a blur really works. Its very simple, though we need to know what is a convolution, better said:  2D convolution. We will show it on an example, lets have 2 matrixes of the same dimension. A convolution is such a function, that for each element of the input data, we calcalute a new one like this (Hope self-explanatory):

This is not a problem to implement on any CPU, we only need 2 cycles to loop dimensions of the input data and another 2 cycles to calculate the 2D convolution. As we see, the output data are independent on any there, therefore this proccess can be effectively implemented using CUDA. At first, here is our basic Kernel limited only by the throughput of the GPU memory:

Now, as you can see, we get an input image represented by char values (0-255) and other info about the image, which we may need: Image width and Image height. At first step, we cretae a lot of variables, that we may use after (A good reader will notice, that we are not using all of them) … Well then we are checking, if a thread can compute a new pixel value, if so, the thread calculates a 2D convolution within the 2 loops (note that DIM is odd  :1, 3 ,5 ,7 , ….). Now back to our mission, you may wonder why there is something like DIM / 2 … thats becase if we have a pixel at [x][y] posittion, we need to calculate the convolution from point [x-DIM/2][y-DIM/2]. However there is a possibility, that these indices might be out of range of our image, so we need to check, that these new coordinates match our image dimensions (Thats the long ,,if”). Because we may otherwise encounter this situation: First pixel at posittion , begin convolution from: [0-something][0-something] … as you can see, we get a negative index, which means that the tread will try to read from a not specified location in the memory therefore resulting in an application error. Furthermore there is a question about what to do with the convolution process on borders of our input data  as there is not enough elements.  To solve this, we simply declare ,,0″ where we cant calculate the convolution. This has an impact on the borders itself, as they appear darker. 