To process large designs quickly, electronic design automation (“EDA”) tools typically consider a variety of options. These options include (1) using more efficient algorithms to do the computations, (2) using more efficient data organization and commands to minimize loads and saves, (3) using multiple CPU's, and (4) taking advantage of the hierarchical nature of the data.
Data organization can entail organizing the data for the fastest access (e.g., using search trees instead of lists), minimizing access to disk, representing the data by using the fewest bytes possible even at the cost some decoding, and minimizing repeated visits to the same data. These considerations are all a consequence of the system memory architecture, where even fetching values from memory into registers is slow compared to the speed of the CPU. Some of these guidelines may be at odds with other requirements (e.g., using 64-bit pointers to gain capacity or 64-bit integers to avoid overflow and underflow problems in arithmetic both increase the volume of data).
There are two main variants for parallel computing. These are threaded computing on an SMP machine and distributed computing on a LAN. A program designed for distributed computing can operate on an SMP machine, but a threaded program will not operate across a LAN. Communications costs between different parts of the computation are vastly higher for distributed computation, as is the cost of launching a new process (compared with the cost of launching a thread). However, certain problems are common to both, such as indeterminism and load balancing.
The load-balancing problem is that of ensuring that all the available processors are busy as concurrently and as equally as possible. Ideally, every processor is 100% busy until the job completes, then all are finished at the same time. On an SMP machine this involves assigning each processor the same amount of computation. In a distributed environment, it involves assigning each processor variable amounts of computation, which happen to consume the same amount of real time (since the processors will not in general be the same speed). Practical solutions will depend on the breakdown of the tasks, with some factors being the number of tasks per processor (few vs. many) and the relative lengths of the tasks (all roughly the same duration, wildly varying durations).
Hierarchical processing of large designs is a key method of improving performance and capacity. In some sense, use of hierarchy is an artifact of the high price of computation. If checking and flattening algorithms were fast enough, and if memory was cheap enough, EDA tools would just flatten everything and run the algorithms directly. That would avoid all the complexity involved in handling the hierarchy, and would allow the tools to process the data in a form most close to the actual chip to be fabricated. The main flaw in the theory (for the world with sufficiently fast processors) has to do with error reporting, since an error in a leaf cell should be reported once against the cell, rather than many times at each instance. The main flaw in practice is that tools cannot flatten and run fast enough. Even postulating EDA tools would treat memory cells/arrays specially, it is unlikely flat processing would be good enough.
A clean hierarchy is one where no instances overlap with each other and no geometry from a parent of an instance modifies the netlist or devices of a child. Ideally, very little geometry from a parent would interact with a child at all. These would make it relatively easy to process each cell and the later just ‘patch’ the results for the instances. In practice, overlapping cell placements are very common. Geometry causing changes to devices or netlists inside child cells is also relatively common.
Several prior EDA tools perform hierarchical operations in a bottom-up fashion. Two prior bottom-up hierarchical operations are the shadow method and the grabbing method. The first-order pseudo-code for both these operations is the same. This pseudo-code is as follows:    (1) load data in hierarchial form    (2) let N be the maximum depth of any cell in the hierarchy    (3) foreach command in the rules file, do    (4) foreach depth from N down to 0    (5) foreach cell at this depth in parallel, do    (6) execute the current command on the current cell
As indicated in this pseudo-code, the shadow and grabbing methods both process each command on each cell of the hierarchical data structure in a bottom-up fashion. In other words, both these methods process each command first for each cell at the highest depth level, then for each cell at the next-highest depth level, and so on until reaching the root cell.
Tools that use the shadow method write flat processing codes independently of any hierarchy code and independently of any parallel processing code. The hierarchy manager has the responsibility for both hierarchy and parallelization. Specifically, the shadow method handles hierarchical interactions as follows. Prior to the main loop above at line (3), “shadows” (i.e., geometries close enough to instances to interact with them) are projected down from parents into children, and the data intersecting the shadows in the child is separated from the rest of the data of the child. Next, at line (6), that data is ignored, and a new step is added in which that data is passed up to the parent and distributed around each instance as if it had been part of the parent. Hence, a more-detailed pseudo code for the shadow operation is as follows.    (1) load data in hierarchial form    (2) foreach cell in top-down order, do    (3) separate cell's shadowed region from cell's clear region    (4) propagate parent shadows plus cell's shadows into cell's children    (5) let N be the maximum depth of any cell in the hierarchy    (6) foreach command in the rules file, do    (7) foreach depth from N down to 0    (8) for each cell at this depth in parallel, do    (9) instantiate all this cell's children's shadowed    (10) regions at part of this cell's clear region    (11) execute the current command on the clear region of current cell 
Under the shadow method, data shadowed at any instance will be promoted into the sites of every instance. In addition, rather than doing the straightforward depth-first loops in parallel, the hierarchy can be treated as a dependency graph and each parent started as soon as its children finish, instead of waiting for each child at that level. Under this approach, multiple passes are made through the whole dataset. Also, the decision about which data to promote to the parent can take into account both the shadow and the particular command being run.
The grabbing method differs with the shadow method in the way that its processes interaction regions. In the grabbing method, the entire cell is checked but any errors are deferred instead of output. Parents reach down into their children and pull out the data that interacts with them, subtracting any false errors previously generated in the child, which this may clear. The following is the detailed pseudo-code for the grabbing method:    (1) load data in hierarchical form    (2) let N be the maximum depth of any cell in the hierarchy    (3) foreach command in the rules file, do    (4) foreach depth from N down to 0    (5) foreach cell at this depth in parallel, do    (6) identify in each child the data shadowed by this cell    (7) instantiate that data in this cell    (8) execute the current command on this cell    (9) fix up any false errors from the child
In the grabbing method, data shadowed at any instance is promoted only into locations where it is relevant. The observation about parallelization that was made for the shadow method can also be made for the grabbing method. Also, in the grabbing method, multiple passes are made through the whole dataset. In addition, the decision about which data to grab into the parent can be made by the command.
It is difficult to estimate on theoretical grounds whether grabbing should be faster or slower than the shadow method. The searching required for the actual grabbing might need to be recursive implying duplicated efforts. Storage of false errors for later fixup adds complexity. In theory, the grabbing method can make more intelligent decisions about which data get instantiated in the parents, because it effectively pulls in data in its own shadow rather than from the union of every shadow. This saves work. Both methods can use the command to guide their choice of data to grab or promote. This can theoretically allow each command to operate on a smaller dataset than if it picked up all the data needed by any command in the session.
Both the grabbing and shadow methods visit the dataset repeatedly, contrary to the performance recommendations. It is probably true that the cells closer to the root contain more data, on average, than cells closer to the bottom. So both methods are likely scheduling larger processing tasks at the end rather than at the beginning, contrary to our parallelization recommendation. The scheduling issue of large items last is aggravated by the migration of data from smaller cells into larger cells, which magnifies the difference.