{"id":3014,"date":"2011-06-27T15:24:08","date_gmt":"2011-06-27T15:24:08","guid":{"rendered":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/?p=3014"},"modified":"2011-08-02T23:58:03","modified_gmt":"2011-08-02T23:58:03","slug":"a-highly-scalable-parallel-boundary-element-method-for-capacitance-extraction-2","status":"publish","type":"post","link":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/a-highly-scalable-parallel-boundary-element-method-for-capacitance-extraction-2\/","title":{"rendered":"A Highly Scalable Parallel Boundary Element Method for Capacitance Extraction"},"content":{"rendered":"<div class=\"pf-content\"><p>Standard boundary element methods (BEMs) involve both an embarrassingly parallelizable system setup step and a linear system solving step of time complexity O(N<sup>3<\/sup>) that cannot be parallelized efficiently. When piecewise constant (PWC) basis functions are adopted to represent solutions, the system solving step dominates the overall computation time (usually more than 90%) and limits the scalability of standard BEMs with the number of parallel computing nodes. For capacitance extraction problems, traditional acceleration techniques, such as the multipole expansion<sup> [<a href=\"#footnote_0_3014\" id=\"identifier_0_3014\" class=\"footnote-link footnote-identifier-link\" title=\"K. Nabors and J. White, &ldquo;FastCap: A multipole accelerated 3-D capacitance extraction program,&rdquo; IEEE Transactions on Computer-Aided Design, vol. 10, no. 10, pp. 1447-1459, Nov. 1991.\">1<\/a>] <\/sup> and the pre-corrected FFT methods<sup> [<a href=\"#footnote_1_3014\" id=\"identifier_1_3014\" class=\"footnote-link footnote-identifier-link\" title=\"J. R. Phillips and J. K. White, &ldquo;A precorrected-FFT method for electrostatic analysis of complicated 3-D structures,&rdquo; IEEE Transaction on Computer-Aided Design, vol. 16, no. 10, pp. 059-1072, Oct. 1997.\">2<\/a>] <\/sup>, can reduce the solving time complexity to O(N log N). However, available parallelization implementations of these two techniques showed that their parallel acceleration saturates quickly with the number of parallel nodes: their parallel efficiency drops to 40% to 60% at just 8 nodes<sup> [<a href=\"#footnote_2_3014\" id=\"identifier_2_3014\" class=\"footnote-link footnote-identifier-link\" title=\"Y. Yuan and P. Banerjee, &ldquo;A parallel implementation of a fast multipole-based 3-d capacitance extraction program on distributed memory multicomputers,&rdquo; Journal of Parallel and Distributed Computing, vol. 61, no. 12, pp. 1751&ndash;1774, Dec. 2001.\">3<\/a>] <\/sup><sup> [<a href=\"#footnote_3_3014\" id=\"identifier_3_3014\" class=\"footnote-link footnote-identifier-link\" title=\"N. R. Aluru, V. B. Nadkarni, and J. White, &ldquo;A parallel precorrected FFT based capacitance extraction program for signal integrity analysis,&rdquo; Proc. 33rd annual Design Automation Conference, 1996, pp. 363&ndash;366.\">4<\/a>] <\/sup>.<\/p>\n<p>The aforementioned methods suffer from poor parallel scalability because their underlying solution representation, PWC basis functions, is inefficient for representing charge distribution, resulting in a large linear system. Solving such a large system dominates the overall computation and drastically degrades the parallel efficiency. To circumvent the bottleneck of solving a large system in parallel, we employ our recently developed instantiable basis functions, which are 30 times more compact than PWC basis functions for the same capacitance accuracy<sup> [<a href=\"#footnote_4_3014\" id=\"identifier_4_3014\" class=\"footnote-link footnote-identifier-link\" title=\"Y.-C. Hsiao, T. El-Moselhy, and L. Daniel, &ldquo;Efficient capacitance solver for 3d interconnect based on template-instantiated basis functions,&rdquo; IEEE 18th Conference on Electrical Performance of Electronic Packaging and Systems, 2009, pp. 179&ndash;182.\">5<\/a>] <\/sup>. Accordingly, the computation for solving a system is reduced from the original 90% of the total time to less than 5%, while the embarrassingly parallelizable part is now dominant (growing from 10% of the total time to more than 95%). In addition, we develop four integration techniques to further accelerate the system matrix filling process by 86%. In our demonstrated examples, our new algorithm is 6 times faster than FastCap<sup> [<a href=\"#footnote_0_3014\" id=\"identifier_5_3014\" class=\"footnote-link footnote-identifier-link\" title=\"K. Nabors and J. White, &ldquo;FastCap: A multipole accelerated 3-D capacitance extraction program,&rdquo; IEEE Transactions on Computer-Aided Design, vol. 10, no. 10, pp. 1447-1459, Nov. 1991.\">1<\/a>] <\/sup> in a single-core environment and achieves 90% parallel efficiency on a 2-cpu-10-core distributed memory system implemented in C++ with MPI parallelization<sup> [<a href=\"#footnote_5_3014\" id=\"identifier_6_3014\" class=\"footnote-link footnote-identifier-link\" title=\"Y.-C. Hsiao and L. Daniel, &ldquo;A highly scalable parallel boundary element method for capacitance extraction,&rdquo; Proc. 48th Annual Design Automation Conference, 2011, pp. 552&ndash;557.\">6<\/a>] <\/sup>.<\/p>\n\n\t\t<style type=\"text\/css\">\n\t\t\t#gallery-1 {\n\t\t\t\tmargin: auto;\n\t\t\t}\n\t\t\t#gallery-1 .gallery-item {\n\t\t\t\tfloat: left;\n\t\t\t\tmargin-top: 10px;\n\t\t\t\ttext-align: center;\n\t\t\t\twidth: 50%;\n\t\t\t}\n\t\t\t#gallery-1 img {\n\t\t\t\tborder: 2px solid #cfcfcf;\n\t\t\t}\n\t\t\t#gallery-1 .gallery-caption {\n\t\t\t\tmargin-left: 0;\n\t\t\t}\n\t\t\t\/* see gallery_shortcode() in wp-includes\/media.php *\/\n\t\t<\/style>\n\t\t<div id='gallery-1' class='gallery galleryid-3014 gallery-columns-2 gallery-size-medium'><dl class='gallery-item'>\n\t\t\t<dt class='gallery-icon landscape'>\n\t\t\t\t<a href='https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-content\/blogs.dir\/10\/files\/2011\/06\/hsiao_boundary_01.png' rel=\"lightbox[3014]\"><img width=\"300\" height=\"225\" src=\"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-content\/blogs.dir\/10\/files\/2011\/06\/hsiao_boundary_01.png\" class=\"attachment-medium size-medium\" alt=\"Figure 1\" loading=\"lazy\" aria-describedby=\"gallery-1-3015\" \/><\/a>\n\t\t\t<\/dt>\n\t\t\t\t<dd class='wp-caption-text gallery-caption' id='gallery-1-3015'>\n\t\t\t\tFigure 1: The charge distribution solutions represented by (a) 572 PWC basis functions and (b) 17 instantiable basis functions, respectively. The capacitance errors for both cases are 2% with respect to a reference capacitance value extracted by the standard BEM with fine discretization.\n\t\t\t\t<\/dd><\/dl><dl class='gallery-item'>\n\t\t\t<dt class='gallery-icon landscape'>\n\t\t\t\t<a href='https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-content\/blogs.dir\/10\/files\/2011\/06\/hsiao_boundary_021.png' rel=\"lightbox[3014]\"><img width=\"300\" height=\"225\" src=\"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-content\/blogs.dir\/10\/files\/2011\/06\/hsiao_boundary_021-300x225.png\" class=\"attachment-medium size-medium\" alt=\"Figure 2\" loading=\"lazy\" aria-describedby=\"gallery-1-4314\" srcset=\"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-content\/blogs.dir\/10\/files\/2011\/06\/hsiao_boundary_021-300x225.png 300w, https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-content\/blogs.dir\/10\/files\/2011\/06\/hsiao_boundary_021.png 800w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>\n\t\t\t<\/dt>\n\t\t\t\t<dd class='wp-caption-text gallery-caption' id='gallery-1-4314'>\n\t\t\t\tFigure 2: (a) An industry provided interconnect example. Our integration techniques accelerate by 86%. The overall algorithm is 6 times faster than FastCap<sup> [<a href=\"#footnote_0_3014\" id=\"identifier_7_3014\" class=\"footnote-link footnote-identifier-link\" title=\"K. Nabors and J. White, &ldquo;FastCap: A multipole accelerated 3-D capacitance extraction program,&rdquo; IEEE Transactions on Computer-Aided Design, vol. 10, no. 10, pp. 1447-1459, Nov. 1991.\">1<\/a>] <\/sup> for single-threaded execution at the same accuracy (a 2.8% error w.r.t. a reference value). (b) A 24&#215;24 bus example and its parallel efficiency comparison in (c) with the algorithms in<sup> [<a href=\"#footnote_2_3014\" id=\"identifier_8_3014\" class=\"footnote-link footnote-identifier-link\" title=\"Y. Yuan and P. Banerjee, &ldquo;A parallel implementation of a fast multipole-based 3-d capacitance extraction program on distributed memory multicomputers,&rdquo; Journal of Parallel and Distributed Computing, vol. 61, no. 12, pp. 1751&ndash;1774, Dec. 2001.\">3<\/a>] <\/sup> and<sup> [<a href=\"#footnote_3_3014\" id=\"identifier_9_3014\" class=\"footnote-link footnote-identifier-link\" title=\"N. R. Aluru, V. B. Nadkarni, and J. White, &ldquo;A parallel precorrected FFT based capacitance extraction program for signal integrity analysis,&rdquo; Proc. 33rd annual Design Automation Conference, 1996, pp. 363&ndash;366.\">4<\/a>] <\/sup>.\n\t\t\t\t<\/dd><\/dl><br style=\"clear: both\" \/>\n\t\t<\/div>\n\n<\/div><ol class=\"footnotes\"><li id=\"footnote_0_3014\" class=\"footnote\">K. Nabors and J. White, \u201cFastCap: A multipole accelerated 3-D capacitance extraction program,\u201d <em>IEEE Transactions on Computer-Aided Design<\/em>, vol. 10, no. 10, pp. 1447-1459, Nov. 1991. [<a href=\"#identifier_0_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>] [<a href=\"#identifier_5_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>] [<a href=\"#identifier_7_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li><li id=\"footnote_1_3014\" class=\"footnote\">J. R. Phillips and J. K. White, \u201cA precorrected-FFT method for electrostatic analysis of complicated 3-D structures,\u201d <em>IEEE Transaction on Computer-Aided Design<\/em>, vol. 16, no. 10, pp. 059-1072, Oct. 1997. [<a href=\"#identifier_1_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li><li id=\"footnote_2_3014\" class=\"footnote\">Y. Yuan and P. Banerjee, \u201cA parallel implementation of a fast multipole-based 3-d capacitance extraction program on distributed memory multicomputers,\u201d <em>Journal of Parallel and Distributed Computing<\/em>, vol. 61, no. 12, pp. 1751\u20131774, Dec. 2001. [<a href=\"#identifier_2_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>] [<a href=\"#identifier_8_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li><li id=\"footnote_3_3014\" class=\"footnote\">N. R. Aluru, V. B. Nadkarni, and J. White, \u201cA parallel precorrected FFT based capacitance extraction program for signal integrity analysis,\u201d <em>Proc. 33<sup>rd<\/sup> annual Design Automation Conference<\/em>,<em> <\/em>1996, pp. 363\u2013366. [<a href=\"#identifier_3_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>] [<a href=\"#identifier_9_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li><li id=\"footnote_4_3014\" class=\"footnote\">Y.-C. Hsiao, T. El-Moselhy, and L. Daniel, \u201cEfficient capacitance solver for 3d interconnect based on template-instantiated basis functions,\u201d <em>IEEE 18th Conference on Electrical Performance of Electronic Packaging and Systems, <\/em>2009, pp. 179\u2013182. [<a href=\"#identifier_4_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li><li id=\"footnote_5_3014\" class=\"footnote\">Y.-C. Hsiao and L. Daniel, \u201cA highly scalable parallel boundary element method for capacitance extraction,\u201d <em>Proc. 48<sup>th<\/sup> Annual Design Automation Conference, <\/em>2011, pp. 552\u2013557. [<a href=\"#identifier_6_3014\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>Standard boundary element methods (BEMs) involve both an embarrassingly parallelizable system setup step and a linear system solving step of&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[26],"tags":[48,4039],"_links":{"self":[{"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/posts\/3014"}],"collection":[{"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/comments?post=3014"}],"version-history":[{"count":9,"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/posts\/3014\/revisions"}],"predecessor-version":[{"id":4311,"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/posts\/3014\/revisions\/4311"}],"wp:attachment":[{"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/media?parent=3014"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/categories?post=3014"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mtlsites.mit.edu\/annual_reports\/2011\/wp-json\/wp\/v2\/tags?post=3014"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}