1. Field of the Invention
The present invention relates in general to buffer circuits, and more specifically to a novel adjustable buffer configuration.
2. Description of the Related Art
Integrated circuits (large scale, very large scale, etc.) including system-on-chip (SOC) configurations employ one or more master or primary clock signals to synchronize sub-circuits in the system or on an integrated circuit (IC) or chip. The multiple clock signals are often related to each other, such as a higher frequency master clock and several lower frequency clocks (e.g., half-frequency clock, quarter-frequency clock, etc.). The chip employs a clock distribution system to distribute each primary clock signal from one or more root nodes to circuit destination nodes distributed on the chip. It is desired to distribute the clock signals in such a manner so that the applicable clock transitions (i.e., rising edges and/or falling edges) at each of the destination nodes occur simultaneously to ensure proper synchronous operation. Since the clock distribution system is a physical system with unavoidable variations and physical limitations, however, clock transition variations occur, and these variations are called clock skew. A primary goal of the clock distribution system is to minimize skew to within an acceptable level to effectively ensure or possibly even guarantee proper operation. The amount of allowable skew, however, is reduced as the frequency of one or more clock signals is increased.
Several clock distributions methods are known for minimizing skew in the system. One method employs the use of “H-trees” in which a parent clock provided to a common node or root node is distributed via conductive traces to four different end points, each end point being equidistant from the common root node and located within a corresponding one of four quadrants surrounding the root node. Each of the four end points of the primary H-tree formation defines a subsequent “child” root node for a smaller H-tree formation defining four new equidistant downstream end point nodes in corresponding sub-quadrants for each child root node. In this manner, the child H-trees become progressively smaller as the overall H-tree fans out across the circuit. The H-tree technique is an iterative process in which the primary clock is distributed to all applicable destination clock nodes sourced from a primary clock signal. Buffers are inserted along the H-tree routing path depending upon the wire lengths and loading requirements. H-trees are balanced by construction and thus achieve a very good balance within a single tree formation. Yet the H-tree process is a manual process which requires relatively large amount of man-hours to complete. And H-trees are not optimal for multiple tree formations or embedded sub-blocks with their own internal trees. Examples of embedded sub-blocks include processor blocks, digital signal processing (DSP) blocks, memory array blocks, etc. Such sub-blocks are often pre-designed within a CMOS library or the like and are placed on the chip at selected locations on the chip before the clock distribution system is defined. The H-tree formation is symmetrical by design but cannot be routed over the embedded sub-block structures, since such structures are generally relatively dense and do not provide sufficient room for H-tree buffers.
Another clock distribution method is known as clock tree synthesis or CTS. CTS is an automated process performed by a computer-aided design (CAD) system or the like in which a computer compiles one or more clock trees for the chip. The CTS method is automated and thus provides a clock distribution solution more quickly and potentially at reduced cost as compared to the H-tree technique. The CTS method is more suitable when the system includes multiple clocks and embedded sub-blocks. The conventional CTS method was, however, less accurate than the H-tree structure and the resulting compiled tree structures were more difficult to adjust or “tweak” to minimize skew. The compiled tree structures employed multiple buffer types with different timing and drive capabilities. In the conventional CTS process, the buffers were not adjustable so that if a different delay was necessary, the computer selected a different non-adjustable buffer. The branches of any given tree were not symmetrical since each branch was individually optimized and routed, which resulted in significant variations in tree fan-out structures from one branch to the next. In particular, the number of buffers and the wire lengths varied from one branch to another of a given tree. Although an initial CTS tree structure was optimized for under certain process (P), voltage (V) and temperature (T) conditions, because of the significant variation from one branch to another, the overall tree was not optimal for different PVT points. Thus, timing variations occurred due to variations in process, temperature and/or voltage variations for each tree.
Although the conventional CTS method attempted to optimize each tree (even if for a given PVT point), the timing variations between each compiled tree structure also had to be minimized. In one conventional method, an adjustable delay buffer was inserted at the root of each and every compiled tree including the slowest tree. The minimum delay for each adjustable delay buffer was significantly greater than the adjustable delay range of the buffer, so that an adjustable delay buffer had to be inserted at the root of every tree including the slowest tree to enable minimizing skew of all of the trees. The delay in front of the slowest tree was set to its minimal adjustment setting, and the remaining adjustable delays of the faster trees were further adjusted to slow down each faster tree to match the slowest tree. Using this solution to balance multiple trees incurred an undesired and non-trivial delay across the entire system. Adjustable delay buffers have also been provided at the very ends or “leaves” of each tree, as an alternative or in addition to delay buffers at the tree roots. Yet this method consumed valuable real estate since a rather large number of variable buffers were needed including one for each leaf even if the leaf buffers were smaller than the root buffers. The leaf buffers, which were usually smaller than the root-based adjustable buffers, provided only a limited adjustable delay range.
It is desired to provide a clock distribution system and method as automated as possible, that tracks PVT variations, and that enables intra-tree and inter-tree adjustment without inserting delay into the slowest tree.