First of all, I would like to point out,that I will not cover the mathematical background beyond the FFT (Fast Fourier Transform), which is in fact just an algorithm of fast calculation of the Discrete Fourier Transform (DFT). I will just concentrate on how the Radix based Decimation in Time (DIT) / Decimation in Frequency (DIF) methods work. At the very beginning, the first thing to note is that both DIT and DIF have the same computational demands and it is therefore irrelevant wheather we will be covering the DIF or DIT. These two methods just differ in signal flowchart. What is more important is that higher radixes reduce the number of compex operations requied for FFT computation.

We will start with butterflies. They are the most important part of the FFT. Each butterfly consists of several data inputs and outputs. The number of IO pins depends on the radix. Radix 2 Butterfly has 2 inputs and 2 outputs. Radix 4 Butterfly has 4 inputs and 4 Outputs and so on. Each Butterfly’s IO can be interpreted as a matrix (2×2,4×4,8×8 …). These IO matrices consists of Twiddle Factors – Complex roots of unity. I will start with the well known radix 2 butterfly (Image from wikipedia):


The Radix-2 butterfly formula can be reprezented as follows:

  • y0 = x0 + x1.W   |   y1 = x0 – x1.W

Where x0, x1, y0 and y1 are compex inputs/outputs and W is a so called Twiddle Factor or root of unity. The I/O between multiple radix butterflies is reprezented on the next image. Note that for R-2,the matrix corresponds to the formula above and also remember, that a Twiddle Factor reprezents rotation in the complex plane and that W(2,4) is for example identical to W(6,4)  – where the first value is the exponent of the Twiddle factor and the second is the base. See next formulas and examples for more explanations:

The output of each butterfly can be obtained by a matrix-vector multiply with an additional piecewise multiplication by twiddle factors. In other words, the matrix just reprezents the IO relationship. The results must be however multiplied by corresponding Twiddles as well. The corresponding twiddle factors for all butterflies are covered in excel table and matlab script. The only difference is that my R-2 version is an optimized DIT version with precomputed twiddles, so that during each stage, the correct twiddles are just fetched from the memory rather than beeing computed separately for each butterfly during each stage. The same can be done for all other radixes and is required for all efficient implementations.


As we can see, each butterfly (Regardless of radix) requires:

  • Correct data inputs ( Xs )
  • Corresponding Twiddle Factors ( Ws )

The Correct data indexes depend on the “signal flowcharts”, used radix and the choosen method ( DIT / DIF). In fact they can be quite simply deduced from the flowcharts, although I sometimes prefer their excel reprezentation due to their easier checking with a reference algorithm.


As you can see, the R2 flowchart is somewhat different. That is because it is a DIT flowchart unlike the next 2, that are DIF. To distinguish between them, one has to look on the first stage of the FFT. If all the butterflies are together and the data connectors are overlapping in the first stage,then it is a DIF, otherwise its DIT. In fact, if there was a DIT-DIF comparison, you would realize,that their flowcharts are just inverted. That is also why it doesnt matter whether one implements the DIF or DIT algorithm. All flowcharts (Except for R-8) should have the corresponding Twiddle factors in the charts. The R-8 just demonstrates the data indexes for each butterflies. Twiddles are separately in the excel table. I have added a “flowchart” for 256 point R-4 (Stages 1 to 4): 

One of the most important things to note is the distance between the butterfly data. For example in Radix 4 64-point DIF method,the Distance is 16 in the first stage,4 in the second stage and 1 in the last stage. The same applies for all DIF radixes, except the distance is divided by the radix itself (In 64-point DIF R8,the distance is 8 in the first stage and 1 in the second stage). The DIT algorithm unlike the DIF increases this distance (For radix 2 DIT, the distance is 1 2 4 8 16 … ). All Twiddles are in case stored in the excel tables and or in the matlab scripts. The same mechanism shown in Radix 4 formula applies for all other radixes (16,32 …), so creating a radix 16 algorithm is quite easy once we have the R4/R8 algorithm available.

Furthermore, for correct FFT to work ,it is necessary to reverse data in a correct order. The operation is called bitreverse in radix 2 algorithms and digit-reverse in higher radixes (4,8,16 …). The digitreverse can be theoretically achieved by dividing and bitreversing the data,although a direct (Not efficient and in fact extremely slow option is also available):

In = 177;
Scount = 0;
for i = 1:fix(log10(In))
    Scount = Scount + fix(In*10^(-i))*10^(fix(log10(In))-i);
Out = In*10^(fix(log10(In))) – 99*( Scount)

The bitreversal / digitreversal is the only problem of the FFT since it cannot be executed in-place. All butterflies on the other hand can be executed in-place (Input indexes of all butterflies are the same as output indexes). In the end,I just have to apologise for not using the same methods (eg. only DIT or DIF), but as I have mentioned before, it doesnt matter and the only reason I have added R2 DIT is because I used it my bachellor’s work and I have already found a corresponding indexing function for the twiddle factors. I plan on adding the twiddle indexing funtion for other radixes in the future.

Feel free to download, test and comment. I may have some errors somewhere (Hopefully not in the  Matlab Scripts since it looks like they are working – each one was tested with a reference Matlab build-in FFT routine). I try to keep things simple and commented. When looking inside the excel tables – bear in mind that I do prefer indexing with butterflies, so all the numbers in the Stage columns usually correspond to butterflies numbers. Also bear in mind that matlab indexes starts from 1, but because I am mostly programming in C/CUDA I tend to use zero indexes :)