**Finite Differences**” method. Its actually not the commonly used method overall (Due to its limitations and computation requirements),but serves as an easy-to understand method of solving basic tasks such as the distribution of the field in a well defined space such as a in a waveguide. The method uses the Laplace’s Equation deduced from the third Maxwell’s Equation when d/dt = 0 (Static Field) and is a special case of the Poisson’s equation without any free charges in space. Used derivation is depicted below:

The Idea behind this method is quite simple. The region of Interest is discretized into a (usually) rectangular grid of points with a distance of “h”. Inside the grid, we define area with a known voltage (Eg Microstip at location [x,y] with U = 1V). These are known grid points and are usually abbreviated as “Dirichlet Conditions“. However we would like to calculate the distribution of the field (The Electric Potential φ) in the discretized area. There are a few methods that we can actually use to calculate our field. This problem can even be solved directly via matrix inversion, but due to additional complexity is usually solved iteratively,

From the Laplace’s equation, we are just close to the formula used in the FD method. We approximate the differential operator in 2D space (although this can easily be extended to 3D problems as well) via the Taylor expansion. Please note that I have added green diagonal points, which are at distance sqrt(2).h but are not required for the deduction of the formula. Their purpose is to test their influence on the convergence of the methods later used.

The whole problem just therefore shrinks to updating nodes until the difference between iterations Is small enough so that we can say its accurate to our needs. The only problem comes after realizing that these grids are usually large and the convergence of such grids is slow simply due to fact that if there are 2000 points in the X direction, it takes 2000 iterations for the points in the corners to actually make some evidence about potential on the other side of the grid. There are 3 commonly used iterative methods used for solving of this problem: The Jacobi Iteration, The Gauss-Seidel Method and the Successive Over Relaxation (SOR) Method. The Jacobi method is the easiest method with the longest convergence. It calculates the residuum for each point of the grid and updates the grid after all points have been calculated. Therefore it also requires 2 times as much memory, which may be a problem in some cases. On the other hand, the Gauss-Seidel method doesn’t require additional memory, as it updates the grid points on the fly (Eg. point at x = 3 uses values from point x = 2 directly after point at x=2 has been computed.). According to Computation Electromagnetics , it is however necessary that the grid is updated this way with increasing I and j (x and y dimension). This is not a problem for a CPU-based implementation, but greatly decreases performance for any CUDA accelerated routine due to additional requirements for synchronization. In fact rewriting this method to CUDA would require us to launch 1 kernel per row of the grid instead of one kernel per the whole grid. The performance loss could be based on my experience by up to 10 times,but will depend on the grid size.

The latest method is a modified method of the Gauss-Seidel method with a parameter called “Super Relaxation Coefficient ω” which tends to have values in range from **1** to **2**. Having 1 is equal to the Gauss-Seidel method and having 2 or more leads to divergence of the method. The goal of this parameter is to guess the value faster by applying “more” of the calculated residuum or correction to the next iteration per point. Note that this parameter need to be applied to the Gauss-Seidel method. I have reprogrammed the algorithm to CUDA and included the SOR coefficient to the “Jacobi Iteration”. The results are such that when SOR is < 1,the algorithm is convergent (The close to zero, the longer the convergence), however when above 1, it starts to diverge. Here is the only plus for the 8-grid option, as when the coefficient is approx 1.3, then the 4-grid option diverge while the 8-grid method is convergent anyway and requires less iterations to finish. When the coefficient reaches sqrt(2), even the 8-grid algorithm diverge. Overall the benefit of using the 8-grid is negligible.

Implementation of the method in Matlab is simple, but painful, as it is time consuming to calculate any larger grid. It is however easily transformable to parallel environment of the GPU, as each iteration cell may be computed independently. In fact the code could as well be optimized and speed-upped firstly by removing the convergence criterion – which is basically a maximum search in a vector – an operation that can be parallelized, but its performance is somewhere “between CPU and GPU” – and just calculating for a given amount of iterations. Another source for speed improvement could be the removal of the buffer copy (memory synchronization) and switching the computing direction between the 2 buffers instead. The project is a mex file and the routine is callable directly from matlab to test various settings. It is also theoretically possible to remove the initialization checks for CUDA drivers and making the ColumnMajor↔RowMajor conversions inside CUDA as well. Nevertheless, the CUDA accelerated version seems to be approximately 20x faster, although I haven’t done any benchmarks. The provided code is very simple, there is a fixed voltage on the borders of the grid and 2 microstrip lines inside of which one has a positive voltage and the other one zero. Both lines “levitate” in the air because I have simplified the project. Originally, there would have been some kind of substrate with a selectable permitivity.

**Recommended parameter modification for basic simulation:**

1 2 3 4 5 6 |
S.Thresh = 1e-4; % Threshold for error S.MaxIter = 200; % Maximum Amount of Iterations S.Omega = 1.4; % "SuperRelaxation Coefficient" S.PointsPerMm = 20; % Amount of Points per mm S.UseCPU = 0; % 1 -> CPU | 0-> CUDA S.MemSync = 1; % Memory Barrier Synchronization |