{"id":3014,"date":"2011-06-27T15:24:08","date_gmt":"2011-06-27T15:24:08","guid":{"rendered":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/?p=3014"},"modified":"2011-08-02T23:58:03","modified_gmt":"2011-08-02T23:58:03","slug":"a-highly-scalable-parallel-boundary-element-method-for-capacitance-extraction-2","status":"publish","type":"post","link":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/a-highly-scalable-parallel-boundary-element-method-for-capacitance-extraction-2\/","title":{"rendered":"A Highly Scalable Parallel Boundary Element Method for Capacitance Extraction"},"content":{"rendered":"

Standard boundary element methods (BEMs) involve both an embarrassingly parallelizable system setup step and a linear system solving step of time complexity O(N3<\/sup>) that cannot be parallelized efficiently. When piecewise constant (PWC) basis functions are adopted to represent solutions, the system solving step dominates the overall computation time (usually more than 90%) and limits the scalability of standard BEMs with the number of parallel computing nodes. For capacitance extraction problems, traditional acceleration techniques, such as the multipole expansion [1<\/a>] <\/sup> and the pre-corrected FFT methods [2<\/a>] <\/sup>, can reduce the solving time complexity to O(N log N). However, available parallelization implementations of these two techniques showed that their parallel acceleration saturates quickly with the number of parallel nodes: their parallel efficiency drops to 40% to 60% at just 8 nodes [3<\/a>] <\/sup> [4<\/a>] <\/sup>.<\/p>\n

The aforementioned methods suffer from poor parallel scalability because their underlying solution representation, PWC basis functions, is inefficient for representing charge distribution, resulting in a large linear system. Solving such a large system dominates the overall computation and drastically degrades the parallel efficiency. To circumvent the bottleneck of solving a large system in parallel, we employ our recently developed instantiable basis functions, which are 30 times more compact than PWC basis functions for the same capacitance accuracy [5<\/a>] <\/sup>. Accordingly, the computation for solving a system is reduced from the original 90% of the total time to less than 5%, while the embarrassingly parallelizable part is now dominant (growing from 10% of the total time to more than 95%). In addition, we develop four integration techniques to further accelerate the system matrix filling process by 86%. In our demonstrated examples, our new algorithm is 6 times faster than FastCap [1<\/a>] <\/sup> in a single-core environment and achieves 90% parallel efficiency on a 2-cpu-10-core distributed memory system implemented in C++ with MPI parallelization [6<\/a>] <\/sup>.<\/p>\n\n\t\t