Many different implementations of a digital signal processor (DSP) are well known in the art. A conventional DSP typically includes at least one multiply and accumulate (MAC) unit since, for many signal processing applications the operations of multiplication and addition (accumulation) are frequently used, and an appropriately designed MAC unit (implemented as hardwired circuitry) can perform such operations efficiently.
A conventional DSP typically also employs two physically separate memory units: a program memory for storing instructions to be executed by the DSP; and a data memory for storing data to be processed (and optionally also data that has been processed) by the DSP as a result of executing the instructions. The program memory can be a read-only memory (ROM) or a random access memory (RAM) to which data can be written and from which data can be read. The data memory is typically a RAM to which data can be written and from which data can be read.
FIG. 1 is a block level diagram of a digital signal processor (DSP) having a program memory, a data memory physically separate from the program memory, and an arithmetic computational unit (ACU) 10 of the type which can be designed to implement the present invention. The DSP of FIG. 1 includes data memory 6 (connected to address buses AB0 and AB1 and to data buses DB0 and DB1), program memory 4, program control unit (PCU) 2, memory management unit (MMU) 3, arithmetic computational unit (ACU) 10, and input/output unit (IOU) 12.
In implementations preferred for some applications (such as that to be described with reference to FIG. 2), program memory 4 is a single port, read-only memory (ROM) with an array of storage locations 32 bits wide and 64K words deep, and data memory 6 is a dual port, random-access memory (RAM) with an array of storage locations 16 bits wide and 64K words deep. In such implementations, one port of dual port memory 6 can receive a 16-bit address (from 16-bit address bus AB0) and at the same time, the other port of memory 6 can receive another 16-bit address (from 16-bit address bus AB1). A control means is provided so that two simultaneous reads from memory 6, a simultaneous read from and write to memory 6, or a single read from (or write to) memory 6 can be performed.
MMU 3 preferably includes two independent address generation units for generating two address signals (each identifying a memory location in memory 6 for writing data to or reading data from) and asserting such address signals to address buses AB0 and AB1. More specifically, in response to control bits from PCU 2 (which have been generated in PCU 2 by decoding instructions from program memory 4), MMU 3 asserts address signals on address bus AB0 and/or address bus AB1. Data is read from the memory location (in memory 6) identified by each address into pipeline register M0 or pipeline register M1 (or data is written from data bus RB0 and/or data bus RB1 into the memory location identified by each address).
Preferably MMU 3 includes a set of eight address pointer registers (each for storing a 16-bit address which can be asserted to bus AB0 or AB1), an 8-bit pointer modifier register for each address pointer register, and a 16-bit adder for adding the contents of any selected address pointer register with the contents of a corresponding pointer modifier register, and writing the result of this addition back into the address pointer register (in response to control bits from PCU 2). Preferably, MMU 3 also includes other registers for use in modifying the contents of selected ones of the address pointer registers and pointer modifier registers in response to control bits from PCU 2.
In the DSP of FIG. 1, each of first data bus DB0 and second data bus DB1 is preferably 16 bits wide. In variations on the FIG. 1 system, a DSP programmed to implement the invention can have a single port data memory (rather than a dual port data memory), and single address bus and a single data bus (rather than dual address buses and dual data buses).
PCU 2 (a preferred implementation of which will be described below with reference to FIG. 3) includes instruction fetch means (for fetching instructions from program memory 4), an instruction decode unit, and registers for storing control bits generated in the decode unit (for assertion to MMU 3, data bus DB0, or the instruction fetch means).
Arithmetic computational unit (ACU) 10 preferably includes two Multiply and Accumulate (MAC) units which operate in parallel (in response to control bits from PCU 2), and an arithmetic manipulation unit (AMU) which operates in parallel with the MAC units (in response to control bits from PCU 2), as shown in FIG. 5 to be discussed below. The inventive bit rotation and shift circuit (a preferred embodiment of which is shown in FIGS. 6 and 7 to be discussed below) is preferably included within the AMU.
IOU 12 includes means for monitoring the addresses on address buses AB0 and AB1 to determine the type of memory access being implemented. IOU 12 sets a flag to PCU 2 if the addresses are outside a predetermined address range (e.g., addresses for an external memory, other than memory 6, accessible through a port connected along bus AB0 and/or AB1). PCU 2 can assert wait states for slower memory accesses in response to such flags.
The present invention is desirably implemented in a DSP for use in communications operations. For example, it is contemplated that in a preferred embodiment, the DSP of FIG. 1 is programmed to implement the invention, and this programmed processor (identified as processor 100 in FIG. 2) is then included in a mobile digital telephone system of the type shown in FIG. 2. In the FIG. 2 system, serial port SIO of DSP 100 receives digitized speech from audio codec unit 106, and DSP 100 sends digital audio data (via port SIO) to codec unit 106 for conversion to analog form and then transmission to a loudspeaker. DSP 100 is also connected through analog front end circuit 104 to an RF transceiver 108. Circuit 104 includes means for digitizing a received signal from transceiver 108 (for baseband processing by means within DSP 100), and for converting digital data from DSP 100 into a continuous analog signal for transmission by transceiver 108. In typical implementations, circuit 104 would interrupt DSP 100 to indicate a request for or a presence of data (and circuit 104 is mapped into a memory address of DSP 100 so that circuit 104 can efficiently communicate over one of the data buses within DSP 100). Microcontroller 102 supplies control signals to all other elements of the FIG. 2 system and controls the communication protocol between the FIG. 2 system (which is typically a mobile station) and a remote base station. Typically, microcontroller 102 would be connected to a parallel port (PIO) of DSP 100.
Conventional circuits for performing left and right shifting of bits (the bits comprising a data word) have been employed in arithmetic computational units of DSPs. However, such shifting circuits have included two separate circuit branches: one for shifting the bits to the left; the other for shifting the bits to the right. Furthermore, such conventional circuits have not been capable of shifting more than a single bit rotation in one cycle.