This invention is in the field of integrated circuits, and is more specifically directed to programmable integrated logic circuits for executing data processing operations.
As is well known in the art, many advances have been made in recent years in increasing the performance of programmable logic integrated circuits, the prime example of which is the microprocessor. The architecture of modern general purpose microprocessors, such as those having the functionality and performance on a par with PENTIUM microprocessors available from Intel Corporation, generally includes one or more relatively long "pipelines", in which multiple instructions are in various stages of execution in any given machine cycle. For example, a six-stage pipeline may have six instructions in process in a given cycle, with different instructions in the prefetch, fetch, decode, schedule, execute, and writeback stages within a single cycle. Indeed, many microprocessors now are of the so-called "superscalar" type, in which multiple pipelines are provided. The pipeline technique is of particular benefit in microprocessors of the so-called complex instruction set computer (CISC) type, where most of the instructions in the available instruction set require multiple cycles to execute; through the use of pipelining, one instruction may be retired in each cycle, giving an apparent performance of one cycle per instruction.
A difficulty with pipelined architectures arises in the case of conditional branching instructions. As is fundamental in the art, conditional branch instructions change the program flow in response to various conditions, including the relationship of variables to one another or to a constant value, and including the state of various flag or status bits. In a pipelined microprocessor, however, the result of the condition will not be known until after the next several instructions have proceeded along the pipeline to some extent. However, if the condition upon execution transfers control to instructions other than those which have already partially progressed along the pipeline, the pipeline must be flushed and execution restarted from the prefetch stage for the instruction corresponding to the correct target of the conditional branch. This flushing of the pipeline, of course, results in a significant performance penalty. Accordingly, significant circuit overhead is now spent in modem microprocessors to implement branch prediction techniques, as the overall performance of the microprocessor depends in large part upon the accuracy with which conditional branches are predicted, and thus the extent to which pipeline flushes resulting from mispredicted branches are avoided.
By way of further background, microprocessors of the reduced instruction set computer (RISC) type are known in the art; examples of such RISC devices are the 88 k line of microprocessors available from Motorola, and the i860 line of microprocessors available from Intel Corporation. While the reduced instruction set nature of RISC processors tends to reduce the frequency with which multiple cycle instructions are encountered, conventional RISC processors are also pipelined, and thus incorporate the use of branch prediction techniques to avoid pipeline flushes.
Another difficulty encountered by modem microprocessors, of both the CISC and RISC type, occurs from operations upon multi-field data structures, in which the operands are of varying bit width (e.g., eight, sixteen, and thirty-two bit fields). Such multi-field data structures are often encountered in applications and microprocessors in which much of the data storage is off-chip, but where on-chip memory (although limited in size) provides important performance benefits; in such cases, multiple smaller operands may be stored within a single register or addressable memory location, while larger operands may occupy the entire register or memory location. Conventional microprocessors require multiple machine cycles to operate upon multi-field data structures, because of the need to fetch the operand, mask off the un-associated portions of the register or memory location, shift the desired operand to the proper bit position for execution of the instruction, and shift the result to the desired bit position for a masked write into the register or on-chip memory location. While pipelined microprocessors are able to efficiently handle such multi-field data operations when overall performance is measured (approaching one instruction retired per machine cycle), these microprocessors are subject to performance penalty for mispredicted branches and thus are likely to include significant circuit and performance overhead necessary to reasonably predict branch behavior.
Certain system applications of logic circuitry are sufficiently cost-sensitive as to prohibit the use of a general-purpose microprocessor, particularly one in which the maximum performance architectural features of superscalar pipelined operation, with complex branch prediction, are utilized. As such, a need exists in the art for programmable logic circuitry which may be implemented in a low-cost manner, both relative to the cost of the processing logic as well as the cost of associated memory.
However, performance is still of concern in these system applications, especially in the case where the logic circuitry is being required to operate on so-called real-time data. An example of real-time processing is the processing of message packet cells in telecommunications, such as according to the Asynchronous Transfer Mode (ATM) protocol. Especially when video signals are being transmitted in combination with voice signals, real-time processing of the messages presents significant performance demands on the processing logic circuitry. As such, low-cost logic circuitry used in telecommunications processing also must provide a high degree of performance.
It is therefore desirable in many systems, such as those processing ATM communications, to utilize programmable logic circuitry which, for reasons of performance, relies upon on-chip memory for storage of operands and, for reasons of cost, is implemented with a minimum chip area. As a result, packed data structures are attractive in these type of systems, as the packing of data of various field widths into on-chip memory provides maximum utilization of on-chip memory, thus obtaining performance at minimum cost. However, as noted above, multi-field data structures typically involve, in conventional logic circuitry, multiple cycles to perform the shifting, masking, and other operations necessary for handling these data structures. These additional cycles either result in lower performance for the processing circuitry, or in implementation of pipelines and branch prediction techniques.
In addition to the presence of multi-field data structures, however, certain of these applications necessitate a high frequency of conditional branch operations, especially in performing real-time telecommunications processing. As such, the use of pipeline architectures and branch prediction techniques, in addition to increasing implementation cost, also degrades performance due to mispredicted branches, given the large number of branch instructions in such code.