This invention relates to the field of electronic data processing, and more specifically to the field of the electronic data processing of data having a long bit-length, i.e. data chains.
1. Background: Microprocessor Operation
Microprocessors are often required to manipulate binary data having wide ranges of bit lengths, for example data that ranges from a single logic bit to high precision arithmetic operations involving data that may be more than 128 bits in length.
Hardware arithmetic and logic units (ALUs) within microprocessors are generally constructed and arranged to handle fixed bit or word lengths. As a result, high precision arithmetic operations require multiple program steps and multiple microprocessor cycles. These data processing conditions lead to programs that are inefficient in terms of execution time, and lead to programming code inefficiency because the microprocessor hardware and the supporting program instruction set are not optimized for operating on data having a wide range of bit lengths, for example data that is represented by sequential chains of 16-bit data words.
For instructions such as addition and subtraction, this inefficiency results from memory thrashing involving memory stores and memory loads, as well as from program or software loop control overhead. For more complex instructions or operations such as multiplication and operations involving extended precision algorithms the result is even more inefficient. In addition, the negative, the positive, or the zero status of a resulting calculation value must be handled separately for multi-word calculations, thus requiring even more time and more program code.
2. Background: Accumulator Architecture
The accumulator is a central register for all processors, and is used to temporarily store the results of the ALU in a digital processor. In single data bus, single data memory designs, as in most general purpose microprocessors, the accumulator represents a bottleneck to high speed processing. The accumulator is used as both an input operand source and is used to latch the output operand of the ALU. Therefore, as different data values are manipulated, the prior results must be stored to some other register or memory before processing the next data. Frequently data in the accumulator is temporary but must be saved temporarily to make room for an intermediate calculation or operation and additional cycles must be used to retrieve the saved value before proceeding. Several cycles of unwanted execution time frequently must be wasted to satisfy these requirements. These wasted cycles are critical to execution time sensitive routines such as signal processing; saving two cycles every 8 is equivalent to 33% increase in processor speed.
Also, another problem that frequently occurs in common processors is that the single accumulator register requires the accumulator to be used both as the source for one input operand and the destination for output operand of the ALU. An example is addition, where 1 word in memory is added to one word in the accumulator, the results of which are written over the input operand originally stored in the accumulator. Two accumulator designs offer some feature for non-destructive operations, but give advantage only to single word width operations.
3. Background: Chain Processing
It is known that prior microprocessors have included the capability of operating on chains, for example by repeating a given instruction a prescribed number of times. It is also known that a repeat add with carry with data memory address post modification will effectively execute a chain operation. It is also known that others have used fixed hardware multipliers to do extended precision multiplies by using a multiply-by-parts algorithm, this being both complex and inefficient.
Microprocessors typically must manipulate data which has a wide range of bit length. This varies from single logic bit to high precision arithmetic requiring more than 128 bits. Arithmetic Logic units (ALUs) tare fixed word width and must manipulate high precision operations with multiple program steps. The programs become inefficient in terms of execution time and programming code efficiency because the basic hardware and supporting instruction set are not optimized for operating on wide data represented by a sequential chain of data words. For simple instructions like addition, subtraction, etc., this inefficiency results from memory thrashing (store and loads from memory) and software loop control overhead. For more complex operations like multiplication, the algorithms for extended precision are much more involved and inefficient. In addition, status of the resulting value (negative, positive, equal to zero) must be handled separately for multi word calculations, requiring even more time and program code.
It is known that an accumulator performs the function of a central register in a microprocessor, and that the accumulator is used to store the results of an ALU operation. In microprocessors having a single data bus and a single data memory, the accumulator represents a bottleneck to high speed processing.
More generally, a processor""s accumulator is generally used as an input-operand source, and it is also used to latch the operand that is currently in the ALU. Therefore, as different data values are manipulated, the prior results must be stored to another register, or to memory, before processing of the next data can occur. Frequently, the data that is in the accumulator is temporary data that must be temporarily saved in order to make room for an intermediate calculation or operation, and additional cycles must be used to retrieve the saved temporary data before processing can continue. As a result, several cycles of execution time must be wasted. These wasted cycles are critical to time sensitive signal processing routines.
In addition, when a single accumulator is present, the accumulator must be used both as the source for one input operand of the ALU and the destination for the output operand of the ALU. Addition is an example. In an addition situation one 16-bit word that is in memory is added to one 16-bit word that is in the accumulator, and the result is then written over the input operand that was originally stored in the accumulator. The use of two accumulators offers a degree of non-destructive operation, but provides this function only in single word length operations.
The need remains in the art for an enhanced microprocessor whose specialized hardware and instruction set address the problem of operating on long, multiple word length data in an efficient, consistent and unified manner.
This application discloses specialized microprocessor hardware and a specialized instruction set that provides efficient data processing operations on long word length or bit length data. In accordance with the preferred embodiment, instructions that manipulate data include a reserved bit-switch (in the form of a 2-bit field) whose status (A0) causes the instruction to be executed once to operate on a single word of data, or whose status (A0S) causes the instruction to be repeatedly executed as the instruction operates on a chain or list of sequential data, fort example a data chain that comprises the number N of 16-bit words of data wherein N is an integer.
The preferred embodiment involves both specialized hardware structures and specific instruction set enhancements that addresses the problem of operating on long multiple word width data in a efficient, consistent, and unified way. Every instruction word that manipulates data has a reserved bit switch that will cause the instruction to be executed either once operating on single word data or as a repeated execution of the same instruction operating on a chain or list of sequential data (n words).
According to the preferred embodiment, several hardware structures are necessary to support this extended precision instruction set definition. First, a hardware chain register with counter was included to control the repeat count and number of words in the chain.
Second, a register file was implemented to provide the accumulation function for the ALU. Third, specialized address control was necessary to control the sequential acquisition of up to two input operand chains and one output operand chain. Fourth, extra ALU status logic was included to handle arithmetic and logical status in a unified way independent of data width. Fifth, the product high register was routed to the partial sum input of the hardware multiplier to enable a consistent chain multiply function.
The preferred embodiment provides a one-word length ALU (31 of FIG. 1), provides the concept of word chains, and in addition the preferred embodiment provides for the ability to specify four different storage areas that each store a pair of chain values, thus overcoming disadvantages of prior microprocessors.
The disclosed embodiments look at data in a consistent way and have very few limitations relative to number of bits used to represent data. Code is essentially identical for operating on single 16 bit words and operating on up to 32 16-bit words (up to 512 bits total). Execution time is extended linearly with the number of words in the operands with very little or no software overhead. The code is compact, logical and easy to understand.
These and other features and advantages of the innovative processor hardware will be apparent to those of skill in this art upon reference to the following detailed description of preferred embodiments of the invention, which description makes reference to the drawing.