1. Field of the Invention
This invention relates in general to the field of data processing in computers, and more particularly to an apparatus and method for performing double push/pop stack accesses with a single micro instruction.
2. Description of the Related Art
Software programs that execute on a microprocessor consist of macro instructions, which together direct the microprocessor to perform a function. Each instruction directs the microprocessor to perform a specific operation, which is part of the function, such as loading data from memory, storing data in a register, or adding the contents of two registers.
In a desktop computer system, a software program is typically stored on a mass storage device such as a hard disk drive. When the software program is executed, its constituent instructions are copied into a portion of random access memory (RAM). Present day memories in computer systems consist primarily of devices utilizing dynamic RAM (DRAM) technologies.
Early microprocessors fetched instructions and accessed associated data directly from DRAM because the speed of these microprocessors was roughly equivalent to the DRAM speed. In more recent years, however, improvements in microprocessor speed have far outpaced improvements in DRAM speed. Consequently, today""s typical processing system contains an additional memory structure known as a cache. The cache is used to temporarily store a subset of the instructions or data that are in DRAM. The cache is much faster than the DRAM memory, but it is also much smaller in size. Access to a memory location whose data is also present in the cache is achieved much faster than having to access the memory location in DRAM memory.
Cache memory is typically located between main memory (i.e., DRAM memory) and the microprocessor. In addition, some microprocessors incorporate cache memory on-chip. Wherever the cache resides, its role is to store a subset of the instructions/data that are to be processed by the microprocessor.
When a processing unit in a microprocessor requests data from memory, the cache unit determines if the requested data is present and valid within the cache. If so, then the cache unit provides the data directly to the processing unit. This is known as a cache hit. If the requested data is not present and valid within the cache, then the requested data must be fetched from main memory (i.e., DRAM) and provided to the processing unit. This is known as a cache miss.
Structurally, a cache consists of a number of cache lines, a typical cache line being 32-bytes in length. Each cache line is associated with, or mapped to, a particular region in main memory. Thus, when a cache miss happens, the entire cache line is filled, that is, multiple locations in memory are transferred to the cache to completely fill the cache line. This is because large blocks of memory can be accessed much faster in a single access operation than sequentially accessing smaller blocks.
In addition, typical microprocessor operands range in size from one byte to eight bytes. But, for a cache to provide the capability to selectively address and transfer individual bytes of data would require the addition of complex and costly hardware to a microprocessor design. To simplify cache designs, a present day processing unit within a microprocessor accesses cache lines in subdivisions called cache sub-lines. Thus, when a processing unit accesses an operand at a given memory address, the entire cache sub-line to which the operand is mapped is accessed by the processing unit; data logic in the processing unit places the operand at its specified location within the cache sub-line. Typical cache sub-lines are eight bytes in length. For instance, an instruction directing access to a 4-byte operand would result in access to the operand""s associated 8-byte cache sub-line. Hence, by fixing the size of a cache line access to be equal to the size of a cache sub-line, microprocessor designers are able to produce designs which are less complex and costly.
In general, accessing a larger amount of data than what is really required in a single access does not impose a burden on microprocessor performance. Typically, only one cycle of a microprocessor clock is required to access an entire cache sub-line and locate an operand within it. Yet, microprocessor designers continue to be challenged to produce microprocessor designs having execution speed, without having increased power consumption, complexity, or cost.
One approach to improving the overall execution speed of a design is to improve the execution efficiency of particular frequently used instructions. Frequently used instructions are those instructions which are found in significant quantities within a meaningful number of application programs. By bettering performance of frequently used instructions, the overall performance of a microprocessor is notably improved.
Most present day microprocessors provide instructions to store/retrieve data to/from a common data structure known as a stack. These instruction are called stack access instructions. A stack is a data structure occupying a designated location in memory. The stack is used to pass operands between application programs, or between subroutines within a given application program. A stack structure is unique in the sense that locations within the stack are prescribed with reference to a top of stack location, which is maintained by register logic within a microprocessor. Hence, to place operands on the stack or to retrieve operands from the stack, an application need only reference the current top of stack location. When a stack access instruction is executed, a top of stack pointer register in the microprocessor is automatically incremented or decremented to adjust the top of stack accordingly.
Two frequently used stack access instructions are PUSH and POP. A PUSH instruction directs a microprocessor to store an operand at the current top of stack. A POP instruction directs the microprocessor to retrieve an operand from the current top of stack. The PUSH and POP instructions are the instructions most commonly employed to pass operands between application programs as described above. Moreover, in desktop applications, it is highly likely to find sections of a given application program wherein several sequential PUSH or POP instructions are executed. This is because, rather than passing a single operand, desktop applications normally pass several operands on the stack. Furthermore, the architecture for addressing operands within a stack, i.e., with reference to the top of stack, dictates that successive stack access will indeed access adjacently located operands in the stack. Consequently, it is not uncommon to observe instances in a given application program where the execution of sequential stack access instructions results in repeated access to a particular cache sub-line. Although two operands may reside within the particular cache sub-line, the stack access instructions themselves only prescribe a single access to a single operand.
Repeatedly accessing the same cache sub-line to store/retrieve data prescribed by successive stack access instructions truly is an inefficient use of microprocessor resources which manifests itself in unnecessary program delays. In particular, the execution of repeated PUSH/POP instructions in a present day microprocessor wastes a great deal of valuable execution time because it is highly probable that at least two successive PUSH/POP instructions will result in access to the same cache sub-line. One skilled in the art will appreciate that performance of a microprocessor can be significantly improved by combining successive accesses to the same cache sub-line into a single access.
Therefore, what is needed is a microprocessor capable of combining two access operations to access two operand prescribed by two successive stack access instructions into a single access to a cache sub-line to access the two operands.
In addition, what is needed is an apparatus in a microprocessor to combine two sequential push/pop instructions into a double push/pop micro instruction which executes during a single instruction cycle.
Moreover what is needed is a method for combining multiple sequential stack accesses in a microprocessor into a single operation, where the single operation is performed in one instruction cycle.
To address the above-detailed deficiencies, it is an object of the present invention to provide a microprocessor for combining successive stack access operations into a single access to a cache sub-line to access two operands prescribed by two successive stack access instructions.
Accordingly, in the attainment of the aforementioned object, it is a feature of the present invention to provide a microprocessor for accessing a stack. The microprocessor has a translator and access alignment logic. The translator receives a first stack access instruction and a second stack access instruction, and decodes the first and second stack access instructions into a double access micro instruction. The double access micro instruction directs the microprocessor to accomplish both accesses prescribed by the first and second stack access instructions during a combined access. The access alignment logic is coupled to the translator and indicates alignment of two data entities specified by the first and second stack access instructions. The combined access is precluded when the access alignment logic indicates that the data entities are misaligned.
An advantage of the present invention is that two successive stack access instruction execute in half the time of that provided by present day microprocessors.
Another object of the present invention is to provide an apparatus in a microprocessor for combining two sequential push/pop instructions into a double push/pop micro instruction which executes during a single instruction cycle.
In another aspect, it is a feature of the present invention to an apparatus in a microprocessor for accessing a stack. The apparatus has translation logic, access alignment logic, and data access logic. The translation logic receives a first stack access instruction and a second stack access instruction from an instruction queue, and decodes the first and second stack access instructions into a double access micro instruction. The double access micro instruction directs the microprocessor to accomplish both accesses prescribed by the first and second stack access instructions during a combined access. The access alignment logic is coupled to the translation logic and indicates alignment of two data entities within a cache for the combined access. The data access logic is coupled to the translation logic. The data access logic receives the double access micro instruction, and accomplishes the combined access within a single instruction cycle. The combined access is precluded when the access alignment logic indicates that the data entities are misaligned within the cache and the combined access is allowed when the access alignment logic indicates that the data entities are aligned within the cache.
In yet another aspect, it is a feature of the present invention to provide an apparatus in a microprocessor for accessing two data entities in a stack during a single instruction cycle. The apparatus includes a shadow stack pointer, a performance predictor, and an instruction decoder. The shadow stack pointer monitors a stack pointer in the microprocessor, and indicates a top of stack associated with a macro instruction prior to execution of the macro instruction. The performance predictor is coupled to the shadow stack pointer. The performance predictor indicates whether performing a double access to access the two data entities would result in an aligned access. The instruction decoder is coupled to the performance predictor. The instruction decoder translates two stack access macro instructions into a double access micro instruction when the performance predictor indicates that the double access would result in an aligned access. The double access micro instruction directs the microprocessor to access the two data entities as a combined access within a single instruction cycle.
Another advantage of the present invention is that application programs having successive push/pop instructions execute faster than has heretofore been provided.
Yet another object of the present invention is to provide a method for combining multiple sequential stack accesses in a microprocessor into a single operation, where the single operation is performed in one instruction cycle.
In a further aspect, it is a feature of the present invention to provide a method in a microprocessor for combining two stack accesses into a double access. The method includes predicting whether the double access would result in an aligned or misaligned cache access; if an aligned access is predicted, translating two stack access macro instructions into a double access micro instruction, the double access micro instruction directing the microprocessor to perform the double access in a single instruction cycle; and if a misaligned access is predicted, translating each of the two stack access macro instructions into an associated stack access micro instruction which directs the microprocessor to perform an single access to the stack in a single instruction cycle.
Yet another advantage of the present invention is that a cache sub-line does not have to be accessed twice in order to execute two sequential stack access instructions prescribing access to two operands within the cache sub-line.