The present invention generally relates to a data processing system and, more particularly, to a pipelined data processing apparatus and method for performing different word-length arithmetic operations.
Recent communications and multimedia products, such as portable phones, personal digital assistants (PDAs), ATM switches, game machines, digital audio players, digital video players, and personal computers, require a large amount of data throughput. Many of such products are equipped with some form of digital signal processing capability. In digital signal processing involving filtration, mix, and comparison of signals, for example, Fourier transforms, finite impulse response filtering, and infinite impulse response filtering, it may be more advantageous that such processing operation be performed using hardware.
Many electronic products with the digital signal processing capability have a host processor chip, such as a microprocessor or microcontroller, for overall control and I/O operations and a separate digital signal processor (DSP) chip for particular digital signal processing functions as a coprocessor of the host processor.
A basic operation of DSPs is a multiply and accumulate (MAC) operation. A MAC unit multiplies two binary numbers and accumulates (i.e., adds) the result of the multiplication with a third binary number.
DSPs may be designed for use in various application-specific systems, for example, video signal processing for image, speech signal processing for voice, audio signal processing for sound, and the like.
Video signals are normally manipulated with 8 bits. High speed DSPs are necessary for processing video signals because a relatively high capability of mathematical computations is required for the video signal processing. Thus, presently, there is a trend toward vector processors in which several operations are performed simultaneously, rather than scalar processors in which only one operation is executed at a time.
Signal processing involving speech requires more arithmetic operations, and thus more accuracy and precision are needed than does the video signal processing. The speech signal processing processes human voice signals. Most speech signals are processed with 16 bits. For example, the GSM cellular communication standards in Europe require exactly 16 bits of precision, so that 16-bit DSPs may be used in the GSM cellular communication environment.
Recent multimedia applications for high fidelity sound may require more bits for providing a higher degree of accuracy than that required for voice. In the audio signal processing, floating-point DSPs are preferred to fixed-point processors, but in the case of using fixed-point DSPs, 24-bit processors are most commonly used. An example of 24-bit fixed-point DSPs is the Motorola DSP56300 family.
In the event that a 16-bit voice signal is processed in a 24-bit fixed-point DSP being dedicated to processing sound signals, it is necessary to expand the 16-bit voice signal to a 24-bit format. This requires a considerable amount of additional memory capacity, thereby increasing cost. In addition, even though a very small portion of an entire signal processing algorithm processes the sound signal, all the data handled with the algorithm must be converted into the 24-bit format. Furthermore, existing firmware for processing the voice signal is needed to be converted into the 24-bit format. An approach to overcome the above problems is found in U.S. Pat. No. 5,598,362, issued to Adelman et al., on Jan. 28, 1997. According to Adelman et al., a DSP performs both 24-bit arithmetic and 16-bit arithmetic using the same hardware. For a MAC operation in 16-bit mode, shifting operations are used to align operands.
However, in a 16-bit MAC operation of a conventional DSP, additional data alignments are performed four times (once for memory-read and multiply, three times for accumulate and memory-write) or more because a conventional DSP requires a data alignment for every register access operation, such as accumulation, barrel shifting, and bit-field operation, and every memory access operation. This causes problems such as a decrease in computation speed and an increase in power consumption.
Therefore, a need exists for a data processing system for performing arithmetic operations in different word lengths using the same hardware.
It is an object of the present invention to provide a low-power, low-cost, high-performance data processing system suitable for multimedia applications.
It is another object of the present invention to provide an improved host processor-coprocessor system-on-a-chip (SOC) having a pipelined mode of operation.
It is yet another object of the present invention to provide an apparatus for performing different word-length arithmetic operations in a data processing system, using the same hardware.
It is yet another object of the present invention to provide a method for performing a data alignment required during performing different word-length arithmetic operations in a data processing system using the same hardware.
These and other objects, features and advantages of the present invention are provided by a digital signal processor (DSP) which includes a first M-bit register, a second M-bit register, a first shifter, a second shifter, a 2M-bit register, a first execution unit, an extension unit, and a second execution unit, where M is an integer.
According to a preferred aspect of the invention, the first M-bit register stores a first N-bit operand, where N is integers and N is less than M. The second M-bit register stores a second N-bit operand. The first shifter shifts the first N-bit operand to the left. The second shifter shifts the second N-bit operand to the left. The first execution unit performs a first arithmetic operation on the first and second N-bit operands to obtain a 2N-bit result. The 2M-bit register stores the 2N-bit result. The N high-order bits of the 2N-bit result are stored in the N least significant bits of M high-order bits of the 2M-bit register and the N low-order bits of the 2N-bit result are stored in the N most significant bits of M low-order bits of the 2M-bit register. The extension unit extends a content of the 2M-bit register to P bits, where P is an integer and P is larger than the 2N. The second execution unit performs a second arithmetic operation on the extended P bits and a third operand to obtain a P-bit result.
The DSP has an M-bit (e.g., 24-bit) operating mode and an N-bit (e.g., 16-bit) operating mode. A conversion between the M-bit operating mode and the N-bit operating mode is performed in response to one of a mode conversion command and a change of status data in a status register.
The DSP further includes a P-bit register that stores the P-bit result. The P-bit register includes (P-2M) guard bits for preventing overflow and underflow in the 24-bit operating mode. The DSP further includes a status register for storing status data, such as sign, carry, overflow, underflow, zero and other flags, and a second P-bit register for storing the P-bit result. The status register includes (P-2M) guard bits for preventing overflow and underflow in the 24-bit operating mode. The second P-bit register includes (P-2M) guard bits for preventing overflow and underflow in the 16-bit operating mode.
According to another preferred aspect of the invention, a data processing system comprises internal buses, a host processor, a DSP coprocessor, and a first and second data memories. The DSP coprocessor comprises a RAM pointer for generating addresses to access the first and second data memories, a MAC unit for performing a multiply and accumulate operation, an arithmetic unit for performing addition, subtraction, comparison, increment, decrement, and bit-field manipulation operations, a shift and exponent unit for shifting operands and for evaluating exponents, and a local decoder for decoding DSP commands from the host processor and controlling the RAM pointer, the arithmetic unit, and the shift and exponent unit. The host processor is a K-bit processor and the DSP coprocessor is a 3K-bit processor, where K is an integer. The first data memory is divided into a first and second memory blocks which are identical with each other in memory capacity and store 2K-bit wide data. The second data memory is divided into a third and fourth memory blocks which are identical with each other in memory capacity and store the 2K-bit wide data. The first and third memory blocks have odd addresses and the second and third memory blocks have even addresses. The first and second data memory includes a first and second shadow memory blocks which are used to store 3K-bit wide data and have even addresses.
According another preferred aspect of the invention, a method for performing a multiply and accumulate operation in a M-bit data processing system is provided. The method includes allowing the M-bit data processing system to enter one of an M-bit operating mode and an N-bit operating mode in response to a mode conversion command, where M and N are integers, and M is larger than N; providing a first N-bit operand to a first M-bit register in the N-bit operating mode; providing a second N-bit operand to a second M-bit register in the N-bit operating mode; shifting the first N-bit operand; shifting the second N-bit operand; performing a first arithmetic operation on the first and second N-bit operands to obtain a 2N-bit result; and storing the 2N-bit result. The N high-order bits of the 2N-bit result are aligned in N least significant bits of M high-order bits of the 2M-bit register and the N low-order bits of the 2N-bit result are aligned in N most significant bits of M low-order bits of the 2M-bit register. The method further includes extending a content of the 2M-bit register to P bits, where P is an integer and P is larger than the 2N after the storing the 2N-bit result; and performing a second arithmetic operation on the extended P bits and a third operand to obtain a P-bit result. The performing a first arithmetic operation includes multiplying the first and second N-bit operands, and the performing a second arithmetic operation includes adding the extended P-bits and the third operand to obtain the P-bit result. The M may be equal to 24 and the N may be equal to 16. The method further includes performing mode conversions between the M-bit operating mode and the N-bit operating mode in response to one of a mode conversion command and a change of status data.