Everybody tells you, that filtering signals (Real or Complex) is simple. In Fact they are right, the only problem usually comes to designing a good filter, choosing its architecture (FIR / IIR) and also the best implementation for filtration in the application. Basically, filtering a signal is from a mathematical point-of-view a convolution between the input data x(t) and filter impulse response h(t) in continuous time. It turns out that having the highest order of the filter possible is not always a good idea. For example, lets say, you are processing some real data signal and want to filter it using a sharp  filter of 1000th order. Now imagine your signal has a sample rate of 1KHz. In this scenario, the specified filter causes, that you are processing the data with a 1s delay. In other words, lets say the filter controls some car driving feedback mechanism. If something goes wrong with the car, the loop will interact with 1 second delay, which may already be too late!

When it also comes to real time processing, there are some other problems,which programmers usually completely ignore. Due to the standard implementation of convolution via the FFT, it is usually a good idea to process all the data at once. This approach is ideal for post-processing only, because in real-time, you cannot wait for “ALL” the samples to arrive, because all of them will likely never really arrive and if so,then something is terribly wrong with the universe. Furthermore, the longer you wait, the higher the delay of the processing is. Lets take a standard phone as an example. If someone would have decided to wait for example 5 seconds, before processing, coding and transmitting data, the telephone would have not been so popular nowdays. The actual answer to this problem is  the continuous data processing which is usually implemented in some “Reasonable” long Batches.

There are actually 2 common methods used: OLAP (Overlap and Add) and OLAS (Overlap and Save). Both of these algorithms internally use the FFT and both are actually very similar. Core of each of these algorithms lies in the FFT Convolution. Note however that .* is element-wise multiplication,which suggest that both of the sequences has to be the same length. Also note that usually the Filter Response remains constant over time,which is equal to filtering the data sequence with the same filter. As a result the Fourier image of the Impulse response (also known as Frequency-Response) may be precomputed in advance,so that only 2 FFTs are reuqired to compute the filtration result.

Result = ifft(fft(Data).*fft(FilterResponse))

What is important here is to note,how the convolution works. If we take 10 signal samples (N) and our filter has a response of 5 samples long (L), how many samples does the convolution result hold? The Answer is N +  L – 1 due to the overlaps of the impulse response with the data vector. This is a common mistake as one cannot simply just compute the real (linear) convolution as we know in N samples. Well … to be honest, he can and in fact computing the convolution result in N samples is the core of the OLAS method. This technique is called a Cyclic Convolution. By making the result vector shorter, than is necessary, we are creating aliases and as the output of the FFT is periodic, the aliases will in fact produce correct results of the filtration due to aliasing of the time data with the impulse response. Understanding of the cyclic convolution is however not as easy as in the case of linear convolution. Therefore, for most people, it is better to start working with the Overlap and Add method, which uses proper zero padding and produces a signal with artifacts from the overlap areas as one would expect. The best visualization can be done, if we let both sequences be square waves. In this case,the filtration result is a “triangular-shape” curve,where the sequences overlap and constant otherwise:

Here,we can note that the correct output of the convolution is in the middle and that N (the batch length) may be chosen arbitrary (Preferably so that N + L – 1 = 2^x). After computing the batch,the output is stored in a vector and next piece of the data can be processed. Once this is done,here comes the “ADD” operation. As anybody would imagine,concatenating these responses one after another would create at first signal artifacts arising from the convolution,but also produce N + L – 1 samples rather than N samples in each Batch. Therefore, the batch results are joined where there are the conv. artifacts: Eg. the Falling piece of the first batch result is added to the rising piece of the second result and so on. I have added a visualization of the OLAP (With Impulse Response being a real FIR implementation impulse response):

The overlap and save method,although it uses the cyclic convolution is easier to implement,since there is no addition at the batch output and the data sequence length may be chosen arbitrary large. The usable length of the data is however always smaller and equals to N – L + 1 (This is equal to OLAP implementation). The impulse response is padded with zeros to match the data length only. This is equal to interpolating the Impulse response Spectrum by the way. At each Batch output, the first L samples are discarded and only the last N – L + 1 samples are concatenated to the final filtrated output vector. The Overlap and save method unlike the overlap and add method overlaps the input data by L points,therefore the second batch has the first L points same as are the last L points of the first batch.