Pratt et al., entitled xe2x80x98CLOCKING SCHEME FOR ASICxe2x80x99 Ser. No. 09/879,065 filed Jun. 13, 2001, U.S. Pat. No. 6,552,590, assigned commonly herewith and incorporated by reference herein.
Hughes et al., entitled xe2x80x98DATA BUS SYSTEM INCLUDING POSTED READS AND WRITESxe2x80x99 Ser. No. 09/893,658 filed of even date herewith, assigned commonly herewith, and incorporated by reference herein.
This invention relates to the design and layout of data processing systems, and particularly to network communication devices, such as switches and routers, which require a multiplicity of functional blocks, hereinafter called xe2x80x98coresxe2x80x99, which are pre-designed or independently designed to perform specific tasks. The invention more particularly relates to facilidtating the layout of such circuits in a single application specific integrated circuit, so as to provide a xe2x80x98system on a chipxe2x80x99. More particularly the invention relates to the simplification of such layout by allowing aggregation of data buses.
The automated design and layout of integrated circuits employing libraries of circuit cells or blocks is now commonplace owing to the unfeasibility of designing systems of enormous complexity by hand. Techniques for this purpose have developed over the last decade or so from comparatively simple rule based methods for the design of combinatorial circuits to present day placement and routing techniques wherein libraries of complex functional blocks or xe2x80x98coresxe2x80x99 can be used in conjunction with sophisticated layout tools to design a system with a given functionality and performance. Even so, the task of design and testing is still particularly lengthy and expensive.
Among the difficulties which are in the way of efficient design of systems on a chip are the different interface styles or configurations of cores, the general problems of achieving an efficient layout, the difficulty of achieving layouts which minimise power consumption, and achieving efficiency in the use of the available area on the silicon chip.
A main feature in the achievement of an efficient layout employing a library of cores is the architecture of the bus system by means of which data is to be transferred from core to core.
An important characteristic of the current design is that most and preferably all data transfers between cores are conducted by way of memory, which may be on-chip memory, such as a scratch pad, or may be off-chip memory such as flash memory or dynamic random access memory. A concomitant of this approach is that data buses from the cores need to be aggregated together. Traditional approaches to aggregation and arbitration between contentious requirements for the same bandwidth on a bus have been based on the transfer of data from all the relevant cores at a common rate. One aspect of the present invention is the ability to aggregate data occurring at different rates from different cores. This requires the inclusion of buffering in arbiters and also possibly xe2x80x98wrappersxe2x80x99 which are provided for individual cores if necessary so that they are compatible with the rest of the bus architecture. At each arbitration point, relevant cores allocated enough bandwidth to allow them to transfer data to or from multiple memories at the design rate of the individual cores. Data at such arbitration points is aggregated from all connected cores and is dispatched towards memory, or higher arbiters, typically at an increased rate such that all lower cores never encounter an overrun or underrun situation, or alternatively with a rate lower than the sum of data rates of aggregated paths, with handshaking limiting the flow rate, and arbitration mechanisms enabling the desired throughput particular to each path.
A further aspect of the invention is to provide for automatic bus width alignment at arbitration points. Most cores output data whose width is (for example) either 32 bits or a multiple or sub-multiple thereof. At arbitration points data from cores narrower than a predetermined width, e.g. 32 bits, is packed into 32 bit words, unused bytes being marked as invalid by accompanying enable flags. Such 32 bit words make their way up the arbitration hierarchy until they reach their appropriate target destination (typically memory), where the data may be unpacked. This unpacking may include discarding any invalid padding bytes included by the source if data paths at a destination point, or exit point from an aggregation element, are narrower than those earlier in the aggregation chain.