One of the fundamental operations of a data processing system is a memory read operation. In a memory read operation, a data requester identifies a portion of data by an index, or an "address," and supplies the address to a memory system. The memory system then forwards an associated portion of data to the requester over one or more machine cycles. Initially, memory read operations were relatively simple operations. For instance, in the first generation of personal computers, the only data requesters were central processing units, the only memory systems were external banks of random access memory ("RAM") cells, and the only amount of data was a byte (8 bits). A RAM circuit could forward the single byte of data in a single machine cycle over a then typical eight-bit bus. Historically, the performance of each new data processing system eclipses the previous system's performance. Some of the most visible improvements between successive systems directly affect the complexity and scope of a memory read operation. Many of these improvements are especially significant to data processors that are integrated onto one or a few integrated circuits.
Some of the improvements to data processing systems that affect data read instructions are multiple execution units, hierarchical memory systems and multi-processor architectures. Architectures incorporating multiple execution units typically execute two or more instructions simultaneously. These concurrent instructions may be slightly staggered in time with respect to each other, as in pipelining schemes, they may be aligned in time, as in the case of superscalar data processors, or both. Regardless, multiple execution units create multiple data requesters that may simultaneously require data. Typically, multiple execution units request data from a small, high speed memory cache. A high speed memory cache is part of a two-level hierarchical memory system. A cache is complemented with a large, slower block of external RAM. Together, the cache and external block of RAM provide fast efficient memory accesses. Multi-processor architectures implement schemes in which multi-processors may require data from a single block of external memory or in which one of the processors may require data within the memory cache of another processor. In all these scenarios, data read operations must account for multiple requesters requiring data at, perhaps, the same time.
Two known improvements of the original read data operation are data burst and critical word first protocols. These protocols recognize that data read operations are time consuming and that memory accesses often occur to the same general area of memory during a small interval of time. This latter observation is called "locality."
According to a burst operation, several data read operations occur together as a group over several clock cycles although the operations are addressed with a single index. Initially, a requester may only require an amount of data equal to or less than the bandwidth of a data bus. However, the associated memory system forwards more data to the requester than allowed by the bus bandwidth in a single clock cycle. For instance, a sixty-four bit data processor may have a bus bandwidth of 128 bits. An associated memory system may forward a total of 512 bits to a requester over four clock cycles in a burst operation. In this case, the memory system forwards 128 bits during each clock cycle. Typically, the memory system forwards the four quad-words beginning at the address specified by X . . . XX000000 (most significant bit to least significant bit), where X means either 0 or 1 as specified by the requester's address. One of the underlying assumptions of a burst operation is that there is some likelihood that the requester will request some of the data adjacent the addressed byte at a subsequent time. If the requester does require some of the adjacent data at a later time, then the requester will already have the data and will not have to occupy the data bus.
A critical word first protocol is a refinement of the burst protocol described above. In the example above, a critical word first protocol requires that a memory system forwards a particular one of the four quad-words first. The other three quad-words follow the critical word. The first quad-word, or "critical word," is selected because it contains a particular data byte, half-word, word, etc. that is immediately needed by the relevant requester. A memory system can satisfy the critical word first protocol by forwarding the quad-word indexed by the address X . . . XXXX0000 (most significant bit to least significant bit), where X means either 0 or 1 as specified by the requester's address.
Known protocols have not kept pace with improvements in data processing architecture. For instance, the two protocols described above are designed primarily to increase the efficiency of read operations that occur serially. They do not provide a protocol for use with simultaneous data requests.