This invention relates to digital electronic system architectures and circuits therefor, particularly to such architectures and circuits for use in applications requiring high performance memory access and data transfer.
Modern digital electronic systems are called upon to provide ever higher system performance, including higher speed data throughput, higher data bandwidth and lower system latencies. Higher system performance is driven by new applications, as well as advances in current applications. For example, the implementation of high definition television ("HDTV") depends critically on increasing digital system performance so as to achieve fundamental improvements in the quality of the large picture size of HDTV, relative to the current television standard. At the same time, advances in personal computers also require increases in system performance to accommodate developments such as parallel, superscalar and other advanced processing techniques.
Increases in system performance ideally keep pace with increases in the performance of components employed in the systems so as to take full advantage of the components' capabilities. In practice, however, system performance lags component performance, being burdened by adherence to conventional architectures in the design of digital electronic systems.
Conventional system architectures generally combine a microprocessor, a main memory, and one or more other system components, such as other microprocessors and input/output devices. These architectures generally rely on a separate data communication mechanism that interconnects, and communicates data among, the system components. In particular, these architectures provide for interconnecting components through the data communication mechanism so as to share the main memory among each of the microprocessor and selected other system components.
These conventional architectures typically implement the data communication mechanism using either a conventional multi-drop data bus or a multi-port hardware switch. In multi-drop bus implementations, data communication is time-multiplexed among the system components coupled to the bus. In multi-port hardware switch implementations, each of the system components is coupled respectively to one of the switch ports, and data communication between any two components. In addition, these architectures typically implement main memory using a plurality of conventional discrete dynamic random access memory ("DRAM") devices, together with associated access circuitry.
These conventional architectures, while suitable for many applications, tend to be inadequate for high performance applications. In particular, conventional architectures are inadequate for applications requiring one or more of high system throughput, high system bandwidth or low system latencies. Conventional architectures have nevertheless been employed. To do so, the architectures' performance shortfalls have typically been addressed using custom engineering solutions that adhere to the fundamental confines of the architecture. For example, to provide enhanced video capabilities, personal computers have employed a video controller connected to the microprocessor through a multi-drop data bus, while using a bank of memory separate from main memory, this memory bank being dedicated to video and typically implemented using video random access memory ("VRAM") devices.
These custom engineering solutions have significant limitations, including that they inherently address only the performance of individual components or features within the system, rather than the performance of the system as a whole. Accordingly, these solutions generally improve overall system performance to only a limited degree, if at all. Moreover, these solutions become increasingly more difficult to implement as performance demands increase, that difficulty increasing implementation expense. Accordingly, conventional architectures are increasingly inadequate, if viable at all, for high performance applications. The architectures' performance shortfalls are more acute while the architecture-bound solutions suffer from ever greater limitations.
Conventional architectures' performance shortfalls stem, in particular, from constraints on the cooperation of system components. In turn, that cooperation depends in large part on data communication and main memory sharing among system components.
The implementation of the data communication mechanism is particularly associated with conventional architectures' performance shortfalls. When the architectures' data communication mechanism is implemented using a conventional multidrop data bus, for example, system performance is limited to the bandwidth and throughput of the bus. Bus bandwidth and throughput is subject both to the loading associated with interfacing the bus to system components and to the bus' physical characteristics, e.g., the length of the bus lines. In addition, because buses time-multiplex data communications, system performance is limited by associated latency in access to system data communications, a limitation that compounds with increases in either or both the number of components seeking to communicate and the size of each communication. In practice, system performance degrades as communications between any two components are impaired for any reason.
Implementing the architectures' data communication mechanism using a conventional multi-port hardware switch, rather than a multi-drop bus, can increase system performance. The increase results from the switch's typically higher throughput and bandwidth. However, these switches tend not only to be expensive, but also to introduce other significant problems in system performance. For example, the switches are not well suited either for networks and other applications requiring data communications in variable block sizes, or for HDTV and other applications requiring random accessibility of data in high speed operations. In addition, these switches typically do not provide for communication of control signals among components. Accordingly, these switches undesirably preclude each component's monitoring, e.g., "snooping", of the other components' memory activities, snooping generally being important to memory protection and cache coherency. Moreover, these switches also tend to substantially preclude the communication of data from one component to a plurality of other components, e.g., multi-cast data communications.
While conventional architectures' performance shortfalls are associated with the implementation of the data communication mechanism, the shortfalls are also associated with implementing a shared main memory. Reliance on conventional discrete DRAM devices to implement main memory significantly limits system performance, for example, as to system bandwidth and throughput. Conventional discrete DRAM devices have bandwidths that are significantly less than those of current microprocessors, as well as those of increasing numbers of other high performance components.
Several approaches have been taken toward improving main memory performance. One approach is to replace conventional discrete DRAM devices with conventional discrete static random access memory ("SRAM") devices in implementing main memory, so as to take advantage of SRAM devices' substantially higher bandwidths. However, using these SRAM devices generally introduces undesirable costs. Because these SRAM devices are approximately four times more expensive per unit memory size than the DRAM devices and because memory size generally is large and is likely to grow, e.g., full feature HDTV sets are expected to require at least 32 megabytes while next generation personal computers generally are expected to require at least 16 megabytes, the cost of implementing main memory using conventional discrete SRAM simply is antithetical to the economics of main memory implementation.
Other conventional approaches to improving main memory performance focus on improving the bandwidth and throughput of discrete DRAM devices. These approaches include incorporating SRAM memory as cache in discrete DRAM devices; bundling memory in propriety subsystems having internal data bussing, caching and protocols; employing multiple internal memory arrays; and employing alternate input/output modes. While each of these approaches tends to achieve some improvement in the performance of DRAM devices, each also tends to be subject to undesirable limitations. First, incorporating cache in the DRAM devices improves performance only to the extent cache hits occur with substantial regularity. However, cache hits tend to vanish under various circumstances, particularly in applications having main memory rapidly accessed by several components. Second, having multiple internal memory arrays tends to improve performance only if successive memory accesses address different arrays. In addition, to accommodate successive accesses of a single array, additional circuitry must be provided that compensates for the associated timing differences in the device's output of data. Third, alternate output modes, which include page mode, static column mode, and nibble mode, allow faster access to data by outputting the data in bursts, but generally at the undesirable expense of reducing random accessibility; that is, the modes at best provide random access only within the burst.
The above, as well as other, conventional approaches to improving main memory performance also have the significant limitation of being directed narrowly at improving the memory's bandwidth and throughput. In doing so, the conventional approaches generally seek specifically to close the bandwidth gaps between main memory and microprocessors. Accordingly, the conventional approaches are not directed at improving cooperation among the system components so as to improve system performance. In particular, these approaches are not directed at improving communication of data among the system components or specifically at improving the sharing of main memory among a plurality of system components, all of which components may have bandwidths comparable to high performance microprocessors.
Accordingly, there is a need for an improved digital electronic system architecture and, in particular, an architecture that permits implementation of high performance digital electronic systems by improving data communication and main memory sharing among the system components. There is also a need for an improved memory circuit and, particularly, for a memory circuit that permits implementation of high performance digital electronic systems.