1. Field
The present disclosure generally relates to computer systems, and more particularly to superscalar computer processors in which "load" and "store" operations may be executed out of order. The present disclosure is directed to a method and device for providing address aliasing of memory locations that simplifies program operation by reducing the complexity of address comparisons that are required to check for load/store collisions.
2. Description of the Prior Art
A conventional computer has several different pieces of interconnected hardware which typically at least include (a) input/output devices such as keyboards and displays that provide a user interface, (b) a permanent memory (storage) device such as a magnetic or optical disk having program and data files, (c) a temporary (volatile) memory device such as random access memory (RAM) for holding portions of the program and data during program execution, and (d) the central processing unit (CPU), or processor, which accesses the storage device and RAM in carrying out the actual program instructions. When a computer program is written, it must be converted from human-readable source code to machine-readable object code that will run in the particular processor being used. When a program is so converted, or compiled, the resulting object code includes a multitude of "addresses" for various program instructions and data embedded in the code. These relative addresses are used by the processor to locate the instructions and data after the program has been loaded into RAM, i.e., to correlate the relative addresses to actual physical addresses in RAM.
Memory addressing can take on many forms. The simplest is absolute addressing (or direct addressing) where the address value corresponds exactly to the physical address. Another common technique is indexed addressing which requires addition of the relative address with an index, or offset, value. This technique allows a large number of program addresses to be mapped to a fixed number of physical memory locations, by dividing up the program into blocks or pages. In some memory addressing schemes, there is only one mapping function from the relative memory to physical memory, while other schemes use multiple address mappings.
Processors use various hardware registers to carry out an instruction set associated with a particular procedure, and to manipulate data. A register is "loaded" with a program instruction or data value by referring to the desired memory address of the instruction or value. Values in the registers can be "stored" at various memory locations either for temporary use during program execution or for ultimate copying to the permanent storage device. An actual memory address is binary, i.e., a series of 1's and 0's (bits). The number of bits in a memory address (its word size) depends upon the number of bits in the processor's registers, typically 8 bits, 16 bits or 32 bits.
The earliest computers allowed only sequential execution of load and store operations. In other words, a particular procedure would be executed one step at a time in accordance with the program flow set forth in the source code. Such processing ensures that there are no problems with memory dependencies. A memory dependency exists if a register is to be loaded from a particular memory location whose value (whether a program instruction or data) was set by a previous store operation.
Later computers (superscalar) were designed to optimize program performance by allowing load operations to occur out of order. Memory dependencies are handled by superscalar machines on the assumption that data be loaded is often independent of store operations. These processors maintain an address comparison buffer to determine if there is any potential memory dependency problem, or "collision." All of the store operation physical addresses are saved in this "store" buffer, and load operations are allowed to occur out of order. At completion time, the address for each load operation is checked against the contents of the store buffer for any older store operations with the same address (collisions). If there are no collisions, the instructions (both loads and stores) are allowed to complete. If there is a collision, the load instructions have received stale data and, hence, have to be redone. Since the corrupted load data may have been used by a dependent instruction, all instructions previous to the load instruction must be restarted, with a resulting degradation in performance.
Memory dependencies can be true or false if the mapping scheme creates ambiguities. A memory dependency is false if evaluation of the memory location for a load operation appears to be the same as that for the memory location of a prior store operation, but in actuality is not the same because the aliases point to different physical memory locations. Many processors, for example, ignore the upper 20 bits in a 32-bit address when checking for collisions, and sometimes further ignore the lowest one or more bits, in order to speed up processing. Since true memory dependencies are less frequent than true register dependencies, superscalar processors generally perform well, but they nevertheless suffer additional deterioration in performance due to false byte collisions. It would, therefore, be desirable to devise a method for simplifying the process of checking for collisions so as to enhance processor performance, and it would be further advantageous if such a method could also completely avoid false collisions.