Gaussian Blur with CUFFT

by | Jul 8, 2015

This post is about implementing Gaussian Blur in CUDA and C#. I have choosen C# in order to make some nice UI  with a  few buttons :) But generally, this post is in fact a combination of my 2 previous posts:

The reason why I have decided to return to this problematic is quite simple: Gaussian Blur is in fact a 2D Convolution and implementing it as a convolution is extremely ineffective. So today,I am going to focus on the basics behind implementing Blur via FFT or CUFFT to be more specific.

Lets start from beginning. Lets say you have a convolution core “k” and and input image “g“. The convolution result is given by “Res = k*g” where “*” stands for convolution. In our case 2D convolution since we are working with images. Now the trick with FFT (Fast Fourier Transform) is that when we apply it to both “g” and “k“,we obtain its fourier reprezentation G and K. Since we have now moved into frequency domain,we can implement convolution in frequency domain simply as G.K (elementwise). After multiplication,we just do the inverse FFT on the result to obtain the blurred image. Here are the steps:

Convolution Core

If you do not know this term, imagine just a 2D matrix or array with values reprezenting a specific function or filter. In our case its the 2D Gaussian (Google for “Gauss wiki”). If you need to implement such a function ,you need to know its dimensions and sigma, which defines the width of the blob. That is why the UI has 2 inputs :)


  1. G = FFT(g)
  2. K = FFT(k)
  3. Blur = IFFT(G.K)

Very simple. Unfortunately there are some other problems connected with FFT and Convolution. First of all, the elementwise multiplication requires both G and K to be the same size. To do this,we simply create a 2D array of zeros and move “k” to the center. Now when we do FFT with this resized “k”,we obtain an interpolated spectrum of “k“. If you dont know what it exactly is, dont hesitate, its quite simple, in fact its the same result of FFT(k) but stretched in discrete to the specific Image size.

Another problem is that we cant do just FFT(g). We have to resize the input Image as well because of the border conditions – Convolution on the edges is not defined. I have mentioned this problem in the previous “cuda convolution” implementation,where I have used extra zeros around the image to suppress artifacts coming from the blur output. This was the easy way and it works. What you can also do is to mirror the image around the edges:

This is a more sophisticated solution and is used here. You can easily simulate all of these operations in Matlab with a few commands and the implementaion itself is quite easy. I found out, I have spent most of the time designing UI and trying to persuade C# to use my GaussBlur.dll library, which is quite simple:

There is a main “GaussBlur” function,that takes care of everything – Resizing, copying data, creating CuFFT Plan, executing it and finally cutting out the original area so that the input and output image has the same size. The only notice here I would like to point out,that the C# code is quite heavy and not exactly optimal, so the application spend most of the time manipulating the image itself rather than calculating blur. The measured time displayed  is of the whole GaussBlur Function and thus including all resize/mirror/copy and cut operations.

What is more interesting is that I have added 2 Extra Kernels for shifting spectrum of the result image. The funny thing is that I was simulating all of these operations in matlab and there was no use for a function “fftshift” that does exactly the same as “ShiftRows2D” and “ShiftCols2D” together. Perhaps some of the functions I have used in matlab was implemented also with a build-in fftshift :)

Nevermind,If you are not sure,what each of the function does,just write a comment and I will try to explain it better. Some functions – especially “AddBorders” are hard to understand. Also note that I had to implement CUFFT normalize,because CUFFT inverse doesnt include normalizing – which is overall very usefull and necessary when transforming from floating point values to unsigned char values.

Which reminds me,that I was lazy,so the blur is implemented only for 1 layer images (Grayscale). The function “GaussBlur” doesnt care about colors however, so if you would like to extend this project to blur RGB images, you can do in the C# part and use the “GaussBlur” function 3 times to blur each color channel :) Also note,that all of my ECC functions are useless,because there is no console :D …

Release version was compiled for  compute_20,sm_20 | compute_30,sm_30 | compute_50,sm_50 And should work on any nVidia Geforce GPU Series 400+. If you are having trouble, try updating your nVidia Drivers. In case you encounter some “dll missing” also try installing microsft Visual C++ Redistributable for Visual Studio 2013. Note that the project is 64-bit only becuase CuFFT is available only as 64-bit since CUDA version 7.0.