In Electronic Design Automation (EDA), software is developed that integrated circuit designers can use to explore and verify their designs. As the semiconductor industry enters the nanometer era, designers need to use EDA tools to check nanometer effects in the circuit to be manufactured. Without design verification, the manufactured integrated circuits have a high chance of malfunction, which leads to expensive re-spins and re-designs.
Circuit simulation is the most accurate way of verifying whether a circuit design works or not. FIG. 1 shows a flow diagram 50 illustrating typical transformations performed in manufacturing a circuit from the time an initial design is realized. Block 52 represents an initial circuit design. This design is broken down into a series of connected device elements 54. Each device element in the design is modeled with accurate analytical models that have been verified by foundries and fabricators. With device elements represented by analytical models, the voltage and current values can be simulated over a period of time by a circuit simulator, represented by dashed block 55. The circuit simulator 55 comprises a computer system programmed to perform circuit simulation operations on data. In circuit simulation, the voltage and current values at all the nodes in the circuit can be obtained by solving a system of differential or algebraic equations (DAE) represented in block 56 of flow diagram 50.
The DAE can be discretized using a finite-difference method and a nonlinear iterative method such as the Newton-Raphson method is used to solve the equations in an iterative process. In each iteration, the nonlinear equations are linearized around a previously obtained solution to produce the linearized system of equations represented in block 58. Then, the linear system needs to be solved.
Matrix solving techniques have widespread use in many scientific and engineering areas. In EDA, matrix solving techniques are essential in areas such as circuit simulation for solving the linear system of equations.
Circuit equations have the following form:
                                                        ⅆ                              Q                ⁡                                  (                  v                  )                                                                    ⅆ              t                                +                      i            ⁡                          (              v              )                                +                      u            0                          =        0                            (                  Eq          .                                          ⁢          1                )            
where v is the vector of voltages at all nodes to be simulated in the circuit, Q(v) is the charge at these nodes, i(v) is the current at these nodes, and u0 is the sources in the circuit.
In solving the above equations, the differentiation operator is first approximated with a finite-difference method. For illustration purpose, the backward Euler method will be described. However, other methods can be applied the same way if other finite-difference methods are used. The discretized system of equations has the following form:
                                                                                                                                    Q                                                                                    t                                              m                        +                        1                                                                              -                  Q                                                                            t                m                                                                    t                                  m                  +                  1                                            -                              t                m                                              +                      i            ⁡                          (              v              )                                +                      u            0                          =        0                            (                  Eq          .                                          ⁢          2                )            
where the solution at time step tm is assumed to be known, and the solution at time tm+1 is to be obtained. The Newton-Raphson iterative method can then be applied to solve the nonlinear discretized equations. In the iterative process, an initial guess v0 is given. Then, a sequence of solutions v1, v2, . . . are obtained that converge at the solution of the nonlinear equations as shown in the graph shown in FIG. 2. In every iteration, the nonlinear equations are linearized around the known solution, vn, obtained in the previous iteration. Then, the following linear system of equations is solved to obtain Δv:
                                          J            ⁢                                                  ⁢            Δ            ⁢                                                  ⁢            v                    =          r                ⁢                                  ⁢                  where          ⁢                      :                                              (                  Eq          .                                          ⁢          3                )                                                      Δ            ⁢                                                  ⁢            v                    =                                    v                              n                +                1                                      -                          v              n                                      ,                            (                              Eq            .                                                  ⁢            3                    ⁢                                          ⁢          a                )                                          J          =                                                    1                                                      t                                          m                      +                      1                                                        -                                      t                    m                                                              ⁢                                                ∂                  Q                                                  ∂                  v                                                      +                                          ∂                i                                            ∂                v                                                    ,        and                            (                              Eq            .                                                  ⁢            3                    ⁢                                          ⁢          b                )                                r        =                  -                                    (                                                                                                                                                                                                      Q                              ⁡                                                              (                                                                  v                                  n                                                                )                                                                                                                                                                      t                                                          m                              +                              1                                                                                                      -                                                  Q                          ⁡                                                      (                                                          v                              n                                                        )                                                                                                                                                            t                      m                                                                                                  t                                              m                        +                        1                                                              -                                          t                      m                                                                      +                                  i                  ⁡                                      (                                          v                      n                                        )                                                  +                                  u                  0                                            )                        .                                              (                              Eq            .                                                  ⁢            3                    ⁢                                          ⁢          c                )            
After Δv is obtained, the updated solution vn+1=vn+Δv is obtained. The process is continued until|Δv|<tol  (Eq. 4)
where tol is some small error tolerance. When the condition of Equation 4 is satisfied, the solution is considered to have converged as shown in FIG. 2, giving the solution at tm+1:v|tm+1=vn+1.  (Eq. 5)
In traditional Spice circuit simulation, the error checking is performed for all node voltages and |Δv| is the maximum absolute value of all entries in the vector Δv, i.e. solution change at all nodes. When equation 4 is satisfied, then the procedure advances one time step and returns to block 56 of flowchart 50 to solve the next system of nonlinear equations corresponding to the next moment in time. In this manner, transient fluctuations in the modeled circuit can be simulated.
The voltage values generated at each time are analyzed to determine if the circuit simulation matches the expected operation of the circuit. For example, the voltage values may be compared with logical analysis of the circuit design. For another example, the voltage and current values can be used to perform power or timing analysis of the circuit design. If, at operation 68, the simulation fails to match the expected operation, then the design is modified to correct the deficiencies, and procedure returns to operation 52 with a new circuit design. However, if the voltages compare favorably with expected operations, then the circuit design is prototyped, i.e., a manufactured circuit 70 is produced.
The major part in solving the linear system is solving the circuit Jacobian matrix matrix, which includes an LU decomposition step and a forward and backward substitution step in a traditional direct approach. Fast and accurate matrix solution is very important in fast and accurate circuit simulation.
Matrix solving techniques can be divided into two categories. One category is called direct matrix solving method. The other category is called iterative matrix solving method.
Consider the following linear system in circuit simulation:JΔv=r  (Eq. 6)
where J is the circuit Jacobian matrix, r is the residual vector, and Δv is the vector of node voltage solution update.
Direct matrix solver does an LU decomposition of matrix J first:J=LU,  (Eq. 7)
where L is a lower triangular matrix and U is a upper triangular matrix. Then we can solve:L·y=r  (Eq. 8)andU·Δv=y  (Eq. 9)
to obtain the solution, Δv, to the linear system. In the following description, x is used to represent Δv for convenience:x≡Δv.  (Eq. 10)
Iterative matrix solver tries to obtain or approach the solution with a number of iterations. With an initial guess, x0, iterative methods approach the solution x in the following process:xn+1=Bxn,  (Eq. 11)
where n≧0 and B is a preconditioned matrix for the iterative solution. For an iterative solution to be efficient, this process should be able to approach the solution with acceptable accuracy in a relatively small number of iterations and the computation of Bxn should be fast. The Krylov-subspace iterative method has good convergence properties and is able to approach circuit solution reasonably fast. The purpose of preconditioning is to make matrix B close to the identity matrix to speed up the iterative process.
In circuit simulation, the size of the linear system can be very big and the pattern of the entries in the matrix can be sparse. Hence, standard circuit simulators all adopt some sparse matrix representation of matrix A. In a sparse representation of a matrix, it is common to only store the entries of the matrix that are non-zero. The non-zero pattern of a sparse matrix is important for the efficiency of LU decomposition. This is because the operation can create non-zero entries in L and U while entries of J in the same places are zero. These entries are called “fill-ins.” To reduce the number of fill-ins, one can reorder the linear system. The reordering process is a permutation of vectors x and b in the linear system. It leads to a permuted linear system:Jrx′=r′,  (Eq. 12)
where Jr is a row and column permutation/reordering of J, r′ is the row permutation of r, and x′ is the column permutation of x. For convenience of further illustration, the subscript and superscripts may be dropped in this description. The purpose of the reordering is to minimize the number of fill-ins created during the LU decomposition of the matrix, which will lead to faster simulation.
In typical parallel circuit simulation, this type of reordering is performed for an additional purpose. One wants to reorder the circuit matrix into the so-called double-bordered system shown in FIG. 3.
In performing the LU decomposition of the reordered matrix Jr, some block operations can be performed in parallel, i.e. the LU decomposition of matrix blocks A1, A2, . . . , Am−1. The bottleneck of the parallel computation is to perform the LU decomposition of the last block. Note that the last block could become dense during the LU decomposition process because of the contributions from the coupling blocks, i.e. the C blocks and the D blocks.
The domain-based multi-level recursive block incomplete LU (ILU) method (BILUTM) proposed by Y. Saad et al. is different from our method. BILUTM was proposed as a general matrix solution. One approach of that method is to apply Krylov-subspace method at all levels of BILUTM, which was reported in the publications for the BILUTM method. At the l-th level of the procedure, the block factorization of the matrix is computed approximately (superscript corresponding to level numbers):
                              A          l                =                              (                                                                                B                    l                                                                                        F                    l                                                                                                                    E                    l                                                                                        C                    l                                                                        )                    ≈                                    (                                                                                          L                      l                                                                            0                                                                                                                                                                E                          l                                                ⁡                                                  (                                                      U                            l                                                    )                                                                                            -                        1                                                                                                  I                                                              )                        ×                          (                                                                                          U                      l                                                                                                                                                    (                                                      L                            l                                                    )                                                                          -                          1                                                                    ⁢                                              F                        l                                                                                                                                  0                                                                              A                                              l                        +                        1                                                                                                        )                                                          (                  Eq          .                                          ⁢          12.1                )            
where Bl≈LlUl and Al+1≈Cl−(El(Ul)−1)((Ll)−1Fl).
For circuit simulation, the above approach leads to the application of the Krylov-subspace iterative method at the top circuit matrix level, which can be very inefficient due to the nature of circuit simulation matrices, which have special characteristics. More specifically, circuit simulations matrices may be very large yet sparsely populated. Applying Krylov-subspace method at the top level means that the Krylov-subspace vectors are of the same size of the top-level matrix. These vectors will consume a lot of computer memory and are slow to calculate. Another possible modified use of the BILUM method is to apply Krylov-subspace method at a particular level of BILUTM, i.e. a particular reduced block. However, the reduced block could become very dense without applying approximation technique while one reduces the top-level matrix to the reduced block.
In the approach by Saad et al., incomplete LU decomposition (ILU) is stored while the approximation error is discarded, which is fine for using ILU decomposition as the preconditioning matrix. Hence, in every iteration, one needs to calculate the product of the original matrix and a vector and the product of the preconditioning matrix and a vector.
For the reasons described above, a new hybrid direct and iterative approach is needed to solve the bottleneck problem caused by the LU decomposition when using parallel computation, when circuit simulator 55 (FIG. 1) may be a computer system having a plurality of processor cores or processor units that respond to logic causing the computer system to perform the function of the circuit simulation. The logic may be implemented as software, hardware, or a combination of software and hardware.