The present invention relates to methods and apparatus for performing digital signal processing operations and, more specifically, to methods and apparatus for enhancing digital signal processors.
As technology for digital electronics has advanced, digital signal processing using digital computers and/or customized digital signal processing circuits has become ever more important. Applications for digital signal processing include audio, video, speech processing, communications, system control, and many others. One particularly interesting application for digital signal processing is the communication of audio signals over the Internet.
The transmission of audio signals over the Internet offers the opportunity to communicate voice signals, in digital form, anywhere in the world at relatively little cost. As a result, there has been an ever growing interest in voice transmission over the Internet. In fact, Internet telephony is a fast growing business area due to is promise of reducing and/or eliminating much of the cost associated with telephone calls. In order to support Internet telephony and/or other applications which may be required to process digital audio and/or video signals, DSPs are frequently used.
Thus, DSPs used to process audio signals are found in digital telephones, audio add-in cards for personal computers, and in a wide variety of other devices. In addition to processing of audio signals, a single DSP may be called upon to processes a wide range of digital data including video data and numeric data.
Digital audio and/or video files or data streams representing sampled audio and/or video images can be rather large. In the interests of reducing the amount of memory required to store such files and/or the amount of bandwidth required to transmit such files, data compression is frequently used. In order to determine if a specific set of data, e.g., a subset of the data being subject to compression, will benefit from compression processing, a correlation operation is often performed. Data compression is then performed on subsets of the data being processed as a function of the output of the correlation operation. Accordingly, correlation operations are frequently performed when processing audio data, video data and other types of data.
As will be discussed in detail below, cross correlation generally involves processing two sequences of numbers, each sequence including e.g., N elements, to produce an output sequence which also has N elements, where N may be any positive integer. Each element of the input and output sequences is normally a number represented by one or more bits. Cross correlation processing generally requires N multiplications and Nxe2x88x921 additions to produce each of the N output elements. Thus, a total of N2 multiples and (N2xe2x88x92N) additions must normally be performed to produce an N element cross correlation output.
From a cost standpoint, it is desirable to avoid building into a DSP a large amount of customized circuitry which is likely to be used only infrequently or is likely to go unused altogether. In typical DSP applications, software is normally used to configure adders, subtracters, multipliers and registers to perform various functions. In some cases, additional specialized circuitry may be included in the DSP. For example, some DSPs include a relatively small number, e.g., two, Multiply-and-Accumulate (MAC) processing units. The MAC processing units can be used to multiply 2 numbers and add the result into a storage register sometimes called an accumulator. MAC units may be reused under software control.
Since the number of MAC units in typical DSPs is relatively limited, computationally intensive calculations such as, e.g., cross-correlation, normally have to rely on software loops and/or multiple processing iterations to be completed.
In addition to cross-correlation, other frequently used DSP functions include sorting, finite impulse response filtering, convolution, vector sum, vector product, and min/max selection. In many applications, such functions generally involve arithmetic calculations applied to long sequences of numbers representing discrete signals.
In many applications, the amount of time available to process a set of data is limited to real world constraints, such as the rate at which digital data representing an audio signal will be use to generate audio signals that are presented to a listener. Real time processing is often used to refer to processing that needs to be performed at or near the rate at which data is generated or used in real world applications. In the case of audio communications systems, such as telephones, failure to process audio in or near real time can result in noticeable delays, noise, and/or signal loss.
While the use of iterative loops to perform signal processing operations serves to limit the need for specialized circuitry in a DSP, it also means that DSPs often need to support clock speeds which are much higher than would be required if more computationally complex operations could be performed without the need for iterative processing operations or with fewer iterative processing operations.
In view of the above discussion, it is apparent that there is a need for methods and apparatus which can be used to reduce the need for iterative processing operations in DSPs. It is desirable from an implementation standpoint, that any new circuitry be modular in design. It is also desirable that circuitry used to implement at least some new methods and apparatus be capable of being used to support one or more common DSP processing operations. In addition, from a hardware efficiency standpoint, it would be beneficial if at least some circuits were easily configurable so that they could be used to support multiple DSP processing operations.
The present invention is directed to methods and apparatus for improving the way in which digital signal processors perform a wide variety of common operations including cross-correlation, sorting, finite impulse response filtering, in addition to other operations which use multiply, add, subtract, compare and/or store functionality.
In accordance with various embodiments of the present invention, digital signal processors and/or other programmable circuits are enhanced through the addition of one or more computation engines. The computation engines of the present invention are of a modular design with each computation engine being constructed from a plurality of computation cells each of which may be of the same design. The computation cells are connected to form a sequence of cells capable of performing processing operations in parallel.
In embodiments where the computation results are read out of the last computation cell in a sequence of computation cells, the values resulting from the processing of each computation cell can be shifted out of the computation engine with the results being passed from computation cell to computation cell so that the results of multiple cells can be read.
The computation cells of the present invention may be implemented to perform a specific function such as cross-correlation, sorting or filtering. Thus, a computation engine may be dedicated to supporting a particular function such as cross-correlation.
However, in other embodiments, the computation cells are designed to be configurable allowing a computation engine to support a wide range of applications.
One or more multiplexers may be included in each computation cell to allow re-configuring of the computation cell and thus how signals are routed between the computation cell components and which computation cell components are used at any given time.
By reconfiguring the way in which the signals are supplied to the internal components of the computation cells and the way in which signals are passed between computation cell components, multiple signal processing operations can be performed using the same computation cell hardware.
A control value supplied to each computation cell in a computation engine can be used to control the components of the computation cells and how each of the computation cells is configured. In some embodiments, e.g., embodiments which support sorting, the configuration of a computation cell is also controlled, in part, by a cascade control signal generated by a preceding computation cell in the sequence of computation cells.
A control register may be included in the computation engine for storing the control value used to control the configuration of the individual computation cells included in the computation engine. The output of the control register is supplied to a control value input of each of a computation engine""s computation cells. Thus, the configuration of the computation engine""s computation cells can be modified by simply writing a new control value into the control register.
A control value may be several bits e.g., 12 bits, in length. In one embodiment, different fields of the 12 bit control signal are dedicated to controlling different elements of the computation cells. For example, different bits may be dedicated to controlling different multiplexers, while another set of bits is dedicated to controlling the resetting of values stored in computation cell storage devices, while yet another bit is set to control whether an adder/subtractor performs addition or subtraction.
In accordance with the present invention, a software controllable portion of a digital signal processor can be used to control the configuration of a computation engine of the present invention by periodically storing an updated control value in the computation engine""s control register. In addition the software controllable portion of the digital signal processor can supply data to be processed to one or more data inputs included in the computation engine and receive, e.g., read out, the results of a processing operation performed by the computation engine of the present invention.
Both the software controllable digital signal processing circuitry and the computation engine of the present invention are, in various embodiments, implemented on the same semiconductor chip.
Because the present invention allows all or portions of many processing operations to be performed in parallel through the use of multiple computation circuits, processing efficiencies can be achieved as compared to embodiments where software loops are used in place of the parallel hardware circuits of the present invention.
Additional features, embodiments and benefits of the methods and apparatus of the present invention will be discussed below in the detailed description which follows.