The relentless push for high-performance and low-power integrated circuits has been met by aggressive technology scaling, which enabled the integration of a vast number of devices on the same die but brought new problems and challenges to the surface. The on-chip power delivery network (power grid) constitutes a vital subsystem of modern nanometer-scale ICs, since it affects in a critical way the performance and correct operation of the devices. In order to determine the quality of the supply voltage delivered to the devices, the designer has to perform static and dynamic simulation of the electrical circuit modeling the power grid. This has become a very challenging problem for contemporary ICs, since power grids encountered in these circuits are extremely large (comprising several thousands or millions of nodes) and very difficult to simulate efficiently (especially over multiple time-steps).
Static (DC) or transient simulation refers to the process of computing the response of an electrical circuit to a constant or time-varying stimulus. Since a power delivery network can be generally modeled as a linear RLC circuit, the process of DC or transient simulation of large-scale power grids amounts to solving very large (and sparse) linear systems of equations. Direct methods (based on matrix factorization) have been widely used in the past for solving the resulting linear systems, mainly because of their robustness in most types of problems. They also have the property of reusability of factorization results in transient simulation with a fixed time-step. Unfortunately, these methods do not scale well with the dimension of the linear system, and become prohibitively expensive for circuits beyond a few thousand elements, in both execution time and memory requirements. In addition, a fixed time-step is almost never used in practice because it becomes very inefficient to constantly simulate during long intervals of low activity. All practical implementations of integration techniques for ordinary differential equations (ODEs) employ a variable or adaptive time-step mechanism. In those cases, the reusability of matrix factorization in direct methods ceases to exist.
Iterative methods involve only inner products and matrix-vector products, and constitute a better alternative for large sparse linear systems in many respects, being more computationally- and memory-efficient. This holds even more so for modern non-stationary iterative methods which fall under the broad class of ‘Krylov-subspace’ methods. See e.g. Y. Saad, “Iterative Methods for Sparse Linear Systems”, Chapter 6, which is incorporated herein by reference in its entirety. Iterative methods possess themselves a kind of reusability property for transient simulation, in that the solution at the last time-step provides an excellent initial guess for the next time-step, thus making a properly implemented iterative method converge in a fairly small number of iterations. In fact, this property also holds in the case of a variable time-step, since the quality of the last solution as initial guess for the next solution is not affected. The above features make iterative methods much more suitable for DC and variable time-step transient analysis of large-scale linear circuits such as power distribution networks.
The main problem of iterative methods is their unpredictable rate of convergence which depends greatly on the properties (specifically the condition number) of the circuit matrix. A preconditioning mechanism, which transforms the linear system into one with more favorable properties, is essential to guarantee fast and robust convergence. However, the ideal preconditioner (one that approximates the circuit matrix well and is inexpensive to construct and apply) differs according to each particular problem and each different type of circuit matrix. That is why iterative methods have not reached the maturity of direct methods and have not yet gained widespread acceptance in linear circuit simulation. Although general-purpose preconditioners (such as incomplete factorizations or sparse approximate inverses) have been developed, they are not tuned to any particular simulation problems and cannot improve convergence by as much as specially-tailored preconditioners.
Another aspect of circuit simulation that has become very important recently is to uncover hidden opportunities for parallelism in its intermediate steps. This is essential for harnessing the potential of contemporary parallel architectures, such as multi-core processors and graphics processing units (GPUs). GPUs, in particular, are massively parallel architectures whose computational power is about 1580 GFlops/s (in 2012), greater by an order of magnitude than that of multi-core processors, and as a result they appear as a platform of choice for the efficient execution of computationally-intensive tasks. However, there has been little systematic research for the development of parallel simulation algorithms, and more specifically algorithms for power grid analysis that can be mapped onto massively parallel architectures. This can be attributed in part to the difficulty in parallelization of direct linear solution methods that have been mostly employed thus far.
On the contrary, Krylov-subspace iterative methods offer ample possibilities for parallelism that have been explored sufficiently well. However, the construction and application of the preconditioner is a very delicate part of parallelizing an iterative method because it is completely application-dependent (and traditional general-purpose preconditioners have very little room for parallelism).
The growing need to simulate large power grids with small memory footprint and efficient parallel execution has led many researchers to deviate from the standard practice of direct factorization methods and present more suitable iterative methods. This has been studied in the past in several paper, including T.-H. Chen and C. C.-P. Chen, “Efficient Large-Scale Power Grid Analysis Based on Preconditioned Krylov-Subspace Iterative Methods”, ACM/IEEE Design Automation Conf., 2001; and J. Shi, Y. Cai, S. X.-D. Tan, J. Fan, and X. Hong, “Pattern-Based Iterative Method for Extreme Large Power/Ground Analysis”, in IEEE Trans. Computer-Aided Design, 26(4):680-692, 2007, each incorporated herein by reference in their entirety, that have proposed iterative solvers for efficient simulation of power delivery networks. Power grid analysis was first formulated as a symmetric positive definite system to be solved by PCG in Chen et al. (cited above), but the preconditioner used was a general-purpose (and inefficient for specialized applications) known as incomplete Cholesky.
A different pattern-based preconditioner was proposed in Shi et al. (cited above), but it is very simple and heuristic and does not appear to reduce the number of iterations significantly. The idea of multi-grid techniques for solving partial differential equations has been proposed for power grid analysis in the past in several papers, including J. Kozhaya, S. Nassif, and F. Najm, “A Multigrid-Like Technique for Power Grid Analysis”, IEEE Trans. Computer-Aided Design, 21(10):1148-1160, 2002; and C. Zhuo, J. Hu, M. Zhao, and K. Chen, “Power Grid Analysis and Optimization Using Algebraic Multigrid”, in IEEE Trans. Computer-Aided Design, 27(4):738-751, 2008, each incorporated herein by reference in their entirety.
More recently, parallel computing architectures have been utilized to accelerate power grid analysis in several papers, including K. Sun, Q. Zhou, K. Mohanram, and D. C. Sorensen, “Parallel Domain Decomposition for Simulation of Large-Scale Power Grids”, ACM/IEEE Design Automation Conf., 2007; J. Shi, Y. Cai, W. Hou, L. Ma, S. X.-D. Tan, P.-H. Ho, and X. Wang, “GPU friendly Fast Poisson Solver for Structured Power Grid Network Analysis”, ACM/IEEE Design Automation Conf., 2009; and Z. Feng and Z. Zeng, “Parallel Multigrid Preconditioning on Graphics Processing Units (GPUs) for Robust Power Grid Analysis”, ACM/IEEE Design Automation Conf., 2010, each incorporated herein by reference in their entirety.
Authors in Feng et al. (cited above) propose multi-grid as a solution method for power grid analysis and they use multi-core and massively parallel single-instruction multiple-thread (SIMT) platforms to tackle power grid analysis, while authors in Shi et al. (cited above) formulate the traditional linear system as a special two-dimension Poisson equation and solve it using analytical expressions based on the FFT algorithm, with GPUs being used to further speed up the algorithm. However, both approaches only solve very regular grid structures with specialized techniques, which can limit their effectiveness for irregular power delivery networks that are found in late design stages.
Preconditioning has been studied as a method for efficiently tackling the electrical and thermal analysis of large-scale and irregular power grid designs in several papers in the past, including and Z. Feng, Z. Zeng, and P. Li, “Parallel On-Chip Power Distribution Network Analysis on Multi-Core-Multi-GPU Platforms”, IEEE Trans. VLSI Syst., 19(10):1823-1836, 2011; J. Yang, Y. Cai, Q. Zhou, and J. Shi, “Fast Poisson Solver Preconditioned Method for Robust Power Grid Analysis”, IEEE/ACM Int. Conf. on Computer-Aided Design, 2011; and H. Qian and S. Sapatnekar, “Fast Poisson Solvers for Thermal Analysis”, IEEE/ACM Int. Conf. Computer-Aided Design, 2010, each incorporated herein by reference in their entirety. In Feng et al. (cited above), the preconditioning has been carried out by multigrid techniques. However, when used as preconditioner for iterative methods, multigrid is not very efficient because it is an iterative method by itself, and also solves a system approximately which can hinder the convergence of PCG. Moreover, some operations of multigrid are not always well-defined (e.g. mapping by interpolation from coarser to finer grids and vice versa, and correction of solutions), and the construction of approximate matrices for all coarser grids is an expensive setup phase which has to be repeated every time the system is reconstructed in each time-step change during transient analysis.