The present invention relates to prefetch buffer memory and in particular to controlling the prefetching of data to eliminate unnecessarily prefetching and later flushing data.
A prefetch buffer is a small, high speed memory device, that is used to store data so that the data is immediately available to downstream processing units. Without a prefetch buffer, the processing unit must access data directly from slower bulk memory. Because today""s microprocessors are much faster than the access time of the associated bulk memory, by directly accessing data from bulk memory, the processing unit may be starved, i.e., the processing unit remains idle while the data is obtained.
A prefetch buffer is a small but fast memory device placed between the bulk memory and the processing unit. Data is prefetched and held in the prefetch buffer until needed by the processing unit. Because the prefetch buffer is fast, the processing unit can quickly access data without having to wait for the data to be directly accessed from the slower bulk memory. Thus, prefetch buffers reduce the latency time of the memory system.
FIG. 1 is a schematic diagram showing a conventional prefetch buffer architecture 10, including a prefetch buffer 12, a decode unit 14, a prefetch control unit 16, and a global bus interface 18, which is connected to the global bus 20. Also attached to the global bus 20 is a memory gateway 22 and a bulk memory 24, which may be an off-chip secondary cache of static random-access memory (SRAM) or an even larger main memory of dynamic random-access memory (DRAM).
The prefetch control unit 16 controls when data is prefetched from memory 24. Prefetch control unit 16 signals global bus interface 18 to prefetch data at a particular address in memory 24. The data is retrieved from memory 24 and is stored in prefetch buffer 12 via memory gateway 22, global bus 20 and global bus interface 18. Prefetch buffer 12 stores 8 lines of data, each line contains eight consecutively addressed 32 bit words.
Decode unit 14 retrieves a line of data from prefetch buffer, decodes the line of data and transmits the decoded signal to downstream processing units (not shown). The line of data retrieved by decode unit 14 is the line of data that has been stored in prefetch buffer 12 the longest.
FIG. 2A illustrates a conventional data word 30 of 32 bits, which include a 16 bit command section 32 and a 16 bit parameter section 34. The command section 32, for example, supplies information regarding what is to be done with the data in the parameter section 34. Sometimes the parameter section 34 may be unused, for example, where a xe2x80x9chaltxe2x80x9d command is present in command section 32.
Additional data words containing only parameters, i.e., no command section, may be associated with a data word containing a command section. As shown in FIG. 2B, two data words 40 and 46 are associated with each other. Similar to data word 30, shown in FIG. 2A, data word 40 contains a command section 42 and a parameter section 44, which may be unused, i.e., some commands necessarily have parameters stored in section 44, while other commands do not use section 44. Data word 46 contains additional parameter data associated with the command found in command section 42. It should be understood that multiple data words containing parameter data may be associated with data word 40. Thus, several additional words containing parameter data may be associated with data word 40.
As discussed above, prefetch buffer 12 stores eight lines of data, with each line containing eight data words of 32 bits each. By way of example, a data word containing a command section, such as word 40 may be stored in a single data line along with seven associated data words with parameters for the command, such as word 46. Alternatively, a line of data may contain multiple words with command sections, along with their associated parameter containing words. Each word stored in prefetch buffer 12 is contiguously addressed with the previous word.
Generally, prefetch control unit 16 independently prefetches data. However, occasionally decode unit 14 prompts prefetch control unit 16 to retrieve data from a different address in memory, when it is necessary to retrieve a data word or series of words that are not contiguously addressed with the preceding data words.
Decode unit 14 determines that it is necessary to prefetch from a new address in memory because the command section of a data word is encoded with a xe2x80x9ccontinuexe2x80x9d command and a word having the address of the data to be obtained is encoded in an associated word. FIG. 2C illustrates a data word 50 having a xe2x80x9ccontinuexe2x80x9d command in the command section 52 and an associated word 56 having the address of the next data word to be retrieved. As shown in FIG. 2C, when a word contains a xe2x80x9ccontinuexe2x80x9d command, the parameter section 54 of the word is unused.
A xe2x80x9ccontinuexe2x80x9d command, when received and decoded by decoding unit 14, indicates that subsequent data words (not the associated address containing word) that may be stored in prefetch buffer 12 should not be used, but that data from another address in memory 24 is to be prefetched. When decode unit 14 receives a xe2x80x9ccontinuexe2x80x9d command, decode unit 14 communicates to prefetch control 16 that a xe2x80x9ccontinuexe2x80x9d command was received and provides prefetch control unit 16 with the new address.
Prefetch control unit 16 stops the prefetching operation, invalidates or xe2x80x9cflushesxe2x80x9d the contents of prefetch buffer 12 and begins prefetching from the new address. Once the data in prefetch buffer 12 is flushed, the prefetched data from the new address is stored in the now empty lines in prefetch buffer 12. Thus, prefetch buffer 12 will store consecutively addressed words starting at the new xe2x80x9ccontinuexe2x80x9d address. Because decode unit 14 receives the data line that has been in prefetch buffer 12 the longest, by the time decode unit 14 receives and decodes a xe2x80x9ccontinuexe2x80x9d command up to seven lines of data in prefetch buffer 12 may be full. Thus, seven lines of prefetched data in prefetch buffer 12 may be discarded when a xe2x80x9ccontinuexe2x80x9d command is decoded by decode unit 14.
Consequently, a large amount of data may be unnecessarily prefetched from memory 24 via global bus 20 and stored in prefetch buffer 12 only to be discarded later when a xe2x80x9ccontinuexe2x80x9d command is decoded. The prefetching of unnecessary data that is later discarded is a waste of valuable bandwidth of the global bus 20 and of memory 24.
Moreover, after the data in prefetch buffer 12 is flushed, the data from the new xe2x80x9ccontinuexe2x80x9d address must be fetched, stored in prefetch buffer 12, and then decoded by decoding unit 14 before it is supplied to downstream processing units. This entails time during which the downstream processing units are not receiving data. Thus, the downstream processing units may be starved and required to remain idle while the appropriate data is fetched from memory 24.
A prefetch buffer architecture includes a prefetch buffer that stores contiguously addressed data words prefetched from a memory and associated control. A continue detect unit is disposed between the memory and the prefetch buffer and is used to examine each data word or a line of data words as it is being written into the prefetch buffer to determine if a xe2x80x9ccontinuexe2x80x9d command is likely to be present. If the potential presence of a xe2x80x9ccontinuexe2x80x9d command is detected, the prefetching of contiguously addressed data is suspended. The data word or the line of words is stored in the prefetch buffer until called by a decode unit. The decode unit decodes the data word having the xe2x80x9ccontinuexe2x80x9d command, and the associated xe2x80x9ccontinuexe2x80x9d address, and issues a command to the prefetch control unit to resume prefetching at the xe2x80x9ccontinuexe2x80x9d address. Thus, little or no data that is stored in the prefetch buffer needs to be flushed at a later time. Thus, little or no unnecessary data was prefetched, which advantageously saves bandwidth of the global bus.
In one embodiment, the continue detect unit includes a comparator circuit or a parallel series of comparator circuits that examine each data word for a predetermined bit pattern, with which every data word containing a xe2x80x9ccontinuexe2x80x9d command is encoded. Each comparator circuit is connected to an OR logic gate, which produces a continue detect signal to the prefetch control unit indicating when one of the data words is likely to contain the xe2x80x9ccontinuexe2x80x9d command. Because it is possible for the xe2x80x9ccontinuexe2x80x9d command to be present in the last word in a line of data, while the associated xe2x80x9ccontinuexe2x80x9d address is present in the first word in the next line of data (which has not been prefetched) a delay circuit is coupled to the comparator circuit that receives the last word in the line of data. Thus, the next line of data containing the xe2x80x9ccontinuexe2x80x9d address will be prefetched prior to suspension of the prefetching operation. The use of comparator circuits is a fast and inexpensive method of probabilistic continue detection. While an actual decoder may be used in place of comparator circuits, which will be accurate, i.e., not a probabilistic continue detection, the commands have a variable number of parameters requiring a complex and expensive decoder.
In another embodiment, once continue detect unit detects the predetermined bit pattern, the continue detect unit transmits the xe2x80x9ccontinuexe2x80x9d address to the prefetch control unit so that data at the new xe2x80x9ccontinuexe2x80x9d address may be prefetched. Thus, the prefetching operation switches from one set of contiguously addressed data words to another set of contiguously addressed data words without waiting for data words with the potential xe2x80x9ccontinuexe2x80x9d command and address to be decoded by the decoding unit. Consequently, the prefetch buffer is efficiently utilized thereby avoiding starvation of the pipeline as well as avoiding wasting the bandwidth of the global bus.