Quite recently, I was benchmarking one of my favorite algorithms. I was trying to optimize it for speed and performance. What I found quite interesting are 3 facts:

First of all, thank god for the beautiful and well done work on Microsoft Visual Studio 2017. I have to mention this almost always because the way everything works as intended is really amazing. I never had any problems with any Visual Studio distribution and if so, It wasn’t the microsoft’s fault, but rather some other company’s product fault. Anyway the fact is that the performance profiling on the VS17 is quite easy and what more is that its really doing its job perfectly.

The Second Fact is that I found the compiler optimizations so massive,that I almost couldn’t believe it. VS17 has some of the optimization parameters in the “c++ code generation settings” ,which include:

  1. OD – Optimization Disabled
  2. O1 – Minimize Size
  3. O1 – Maximize Speed
  4. OX – Full Optimization

When i switched from OD to OX, I was experiencing a huge performance improvements of about 10 times! They must have done their job quite good in order to provide such amazing results. The bottleneck of my algorithm was the usage of std::complex class. I though that when its the standard library class,it should be well optimized and I was rather using this instead of other complex and custom structures. Well it was the reason why it was running so slow. Fortunately the OX optimization helped a lot and I finally decided to stay with the std::complex rather than rewriting everything.

Now the last thing is that one of the problems I encountered was the modulus operator. std::math has the build-in function std::fmod which works usually like a charm. Because my algorithm was using this fucntion quite frequently,the performance profiler showed me,that almost 13% of all the processor time is spent on the calculation of this function! Horrible. Fortunately my algorithm doesn’t need the floating point version,but rather an integer version of this function. And because I have had serious problems in some cases with the c++ operator “%”,I have decided to write my own implementation of the fmod for integer arithmetics. This is in fact trivial:

int Imod(int X,int Y){
       return (X – Y * (X / Y)); // Integer division!

And the results for computing 1024 * 1024 * 128 modulo operations using various functions:

  • std:: fmod 6333 ms
  • Imod: 233 ms
  • “%” : 176 ms

To be honest,I was at first temped to use the “%”,but I do not believe it anymore and in fact,I do hardly ever compute so many times the modulus. Even the performance is almost identical (+- some ms.). The benefit of the Imod function however is that I do now exactly what it is doing and I can be 100% sure, that it will always compute correctly the result. This persuaded me to continue using my own Imod function,  which is a bit slower than “%” but approximately 30 times faster than std::fmod!