The performance of a complex digital processing system is determined by the performance of the processing unit and the performance of the storage unit. The performance of the processing unit is generally measured by the number of instructions executing per unit time. The performance of the storage unit is measured by the number of bytes supplied or stored per unit time. Furthermore, the performance of the processing unit goes hand in hand with the performance of the storage unit. Therefore, unless the supply rate of raw data and the store rate of data generated from processing matches the processing speed of the processor, the processing unit will be stalled and idle most of the time.
There are several levels of storage units in the system. Broadly speaking, they can be grouped as two major categories. One is referred to as cache. Another one is referred to as memory. Cache is further divided into L1, L2 and L3 caches. SRAM is used for cache. DRAM is used for memory. The sequence of storage access from the CPU (central processing unit) for such a hierarchy is L1-L2-L3-memory. The density, power and cost per bit, the total number of bytes per level, as well as the access cycle time of the storage array are increased along this sequence.
From generation to generation of semiconductor technology, the access time of SRAM has followed the performance trend of logic circuits, but the access time performance of the DRAM has remained relatively flat. To improve the performance (data rate) of a storage unit, several techniques are generally used. These techniques includes access interleave and simultaneous access of multiple independent banks for multiple bits and rapid scan out of the data sequentially in a data bus. Such techniques are especially important for the main memory in the storage unit of a complex system.