1. Field of the Invention
This invention relates to central processing units and to input/output interfaces for central processing units.
2. Description of Related Art
Most conventional central processing units (CPUs) have some on-chip input/output (I/O) support logic but rely on one or more external components to complete an I/O system for complex bus interfaces. Multi-chip I/O systems have many drawbacks. The number of components in multi-chip systems make them more expensive because of the number of separate components and the increased board area for connection of the components. The larger number of board level components also lowers system reliability and increases latency because signals shuttle between components and possibly through several components in series.
Despite the drawbacks, multi-chip systems are employed because of the complexity of industry-standard I/O bus protocols such as PCI, VESA, and ISA bus protocols. Such industry-standard busses typically have their own clock signals which are independent (i.e. asynchronous) from the main processor clock signal. Forcing the CPU clock and an I/O system clock to be synchronous is undesirable because most CPUs run at much higher clock frequencies than an I/O system. In theory, the CPU clock frequency could be an integer is multiple of the I/O clock to keep the two clocks synchronous; but in practice, the extremely tight phase tolerances required for the CPU to maintain internal timing margins are difficult to maintain when clock signal multipliers are employed. Additionally, requiring the CPU clock frequency to be a fixed multiple of the I/O clock frequency makes a system inflexible and may stop the CPU from taking advantage of a faster process and operation. For example, if CPU clock frequency is anchored to the I/O clock by an integer multiple constraint, the CPU frequency could not be increased when a faster process is available unless the I/O clock frequency is increased to maintain the integer multiple. The I/O clock frequency may be fixed for other devices on the I/O bus so that the processor cannot take advantage of the faster process.
An alternative to keeping the CPU clock and I/O system synchronous is allowing the CPU and I/O clocks to be asynchronous, but a reliable single-chip CPU having multiple asynchronous clock domains is difficult to produce. Typically, a large number of signals transferred between clock domains must be synchronized which multiplies the chances of metastability failures. The resulting statistical mean-time-to-failure (MTTF) can be too low which makes such CPUs unsuitable for most applications.
A cache-coherent I/O system is integrated in a single-chip central processing unit (CPU) which uses an operating protocol and circuits such as synchronizers and data buffers to operate the I/O system in a clock domain that is completely separate from (i.e. asynchronous to) a clock domain containing a main CPU clock. In accordance with one embodiment of the invention, the integrated I/O system contains a bus control/protocol unit and an I/O memory management unit which operate in an I/O clock domain; and a processing core for the CPU operates in a CPU clock domain. Synchronizers and data buffers respectively provide control and data communications between the two clock domains. The I/O system achieves a very low latency for data transfers to and from the processing core because the I/O system and processing core are in a single monolithic integrated circuit which eliminates shuttling control signals on and off chip. Additionally, throughput is high because the processing unit and the I/O system both have direct access to a data buffer between them so that no delays are incurred for complex communication mechanisms which are commonly employed between a CPU and an external I/O chip-set.
In accordance with one aspect of the invention, a data buffer between the processing unit and the I/O system is large enough to store one or more cache line from a cache memory attached to the processing core, and the I/O system provides for DMA operations between external devices and memory attached to the processing unit. During DMA operations, the buffer can be filled with a full cache line which is manipulated in a cache coherent manner. When a DMA operation requires data transfer from the memory attached to the processing unit, the I/O system checks addresses for data already stored in the data buffer; and if the data buffer contains valid data corresponding to the request, the I/O system transfers the data from the data buffer to the requesting device without further intervention from the processing unit. The data buffer thus serves the two proposes, buffering data between clock domains and acting as an small cache for DMA operations. The processing unit invalidates data in the data buffer if necessary to maintain data consistency.