Code Division Multiple Access (CDMA) is a rapidly expanding data transmission technique and lies at the heart of the Universal Mobile Telecommunications System (UMTS), which is presently in development in many countries. CDMA transmits data over a wide bandwidth and separates the users of that bandwidth by coding each signal with a unique code sequence. Thus, one of the basic functions required by CDMA is code matching, which commonly implemented using a "matched filter" arrangement.
The matched filter is tuned to detect a given code sequence in a stream of input data. The output of the matched filter is a score that indicates a level of correlation between the input data and the code sequence. A better score indicates a higher correlation.
In a strictly digital environment, the transmitted data can be viewed as a sequence of .+-.1 values. Although any pattern of values may occur, all transitions occur at regular intervals known as the "chip rate." Thus, one "chip" is the period of time that is spent at a given value. For example, the UMTS chip rate is presently 3.84 MHz with faster chip rates proposed.
One of the more challenging design issues with respect to UMTS is to locate the initial synchronization code sequence having a length of 256 chip periods. The challenge lies in the required computation rate. It will be appreciated that with the arrival of each new sample, the search for 256 bits of a code sequence is required. With no over-sampling and a chip rate of 3.84 MHz, this would require a minimum of (256*3.84 MHz) operations per second (almost 1 Giga-Operation). If the input signal is over-sampled, the number of operations increases by a factor of the over-sample rate. For brevity, the following examples involve 16-bit code sequences.
FIG. 1 shows a 16-bit code sequence being matched to an input stream of data. From left to right, the code includes the sequence of bits: 1100010110101011. The stream of input data is shown above the code, with the first transmitted bit of the stream being at the left and the last transmitted bit of the stream being at the right.
The code can be matched to the input stream by sliding the code along the input stream in one-bit increments and detecting when there is a perfect match between the code bits and the bits of the input samples under consideration. With a code sequence of 16 bits, the ideal match is where the bits of the code are equal to the bits in a portion of the input stream. The match-score at each position of the code relative to the input stream can be computed by counting the number of bits that match. Thus, a perfect match will have a score of 16.
FIG. 1 illustrates a position in the input stream where there is an incomplete match and a position where there is a perfect match. Note that the incomplete match has a score of 8, while the perfect match has a score of 16.
A matched filter can be implemented using the same structure as a Finite Impulse Response (FIR) filter, where the code is stored in multiplier elements of the filter.
FIG. 2 is a functional block diagram illustrating the FIR implementation of a matched filter for the code sequence of FIG. 1. Matched filter 20 includes sample registers 22a-22p for storing the input stream of bits, wherein the input stream is shifted left to right. Filter 20 also includes multiplier elements 24a-24p aand summing element 26. Each of multiplier elements 24a-24p multiplies the bit from a corresponding one of registers 22a-22p by a predetermined code bit of .+-.1, and summing element 26 totals the outputs from multiplier elements 24a-24p and provides the output "score" which is also referred to as the correlation result. The bitstream is shifted left to right and a new correlation result output with each new input bit of the input stream.
Relative to the example of FIG. 1, two distinctions are noted for the implementation of FIG. 2. First, instead of using 0 and 1 bits to represent bits of the input stream and define the code sequence, -1 and +1 are used. This technique is used to enhance the output score because if a 0 code bit is used, the multiplier output would always be 0 and would not contribute to the score, regardless of whether the input bit matched the code bit. By contrast, when a code bit of -1 is multiplied by a non-matching input bit of +1, the result is -1, which detracts from the output score. The second distinction is that the code sequence of FIG. 1 (1100010110101011) has been reversed in multiplier elements 24a-24p (+1+1 -1+1 -1+1 -1+1+1 -1+1-1 -1+1+1). The reversal is because the input stream is shifted left to right in FIG. 2 as compared to the depiction of the input stream in FIG. 1 where the first bit transmitted is at the left.
In the previous examples, in order to illustrate the basic operation of a matched filter, it has been assumed that the input is a stream of 1's and 0's. In the following paragraphs the matched filter is modified to deal with a digital representation of the analog transmission signal. That is, input samples are words of data, where the value of each word represents a sample taken of an input signal (a point on a waveform oscillating between +1 and -1, for example). Thus, the registers, multiplier elements, and summing element are sized to accommodate input words of data. The second difference from FIG. 2 is that the input signal is over-sampled in order to more accurately determine when the input signal is a +1 or a -1. Thus, each tap of the filter has a number of registers sufficient for the over-sample rate. For example, a 4.times.over-sample rate requires that each tap have 4 registers for storage of 4 sample values.
FIG. 3 illustrates matched filter 40 including n taps. The taps are indicated with dashed blocks and include respective multiplier elements labeled *k.sub.0 -*k.sub.n-1. Each of the input registers stores an input word of data of a selected width.
While matched filter 40 appears relatively straightforward to implement in an FPGA, a conventional implementation uses a large portion of an FPGA's programmable resources. For example, matched filters for code sequences having a length of 256 are not uncommon. If, in addition, an 8-bit sample input is assumed with a 4.times. over-sample rate, 4096 slices of a Virtex.TM. FPGA (available from Xilinx, Inc. and described at pages 3-1 through 3-22 of "The Programmable Logic Data Book, " published in 1999 by Xilinx, Inc. and incorporated herein by reference) are required for the registers (256 taps*4 registers/tap*8 bit delays =8192 flip-flops =4096 slices). Note that 1 slice of a Virtex FPGA includes 2 4-input function generators, 2 flip-flops, and dedicated multiplexer and arithmetic features. In addition to storage for the input samples, 256 multiplier elements are required, including allowing for a 9-bit result, storage for the coefficient, and the multiplication function. Thus, each multiplier element may use 5 slices for a total of 1280 slices for the multiplier elements (256 multiplier elements*5 slices/multiplier element). A summing element having 256 inputs can be implemented with a very large adder tree, with each level in the tree allowing for additional bits from possible larger values. Thus a total of 255 adders of various sizes are required. For example, in a Virtex FPGA, one 16-bit adder can be implemented in 8 slices (generally, 1 slice/2-bit adder). In an ideal situation, a minimum of 2797 bits of addition are required, thereby occupying a minimum of 1400 slices. Thus, the matched filter would occupy 6776 slices (4096+1280+1400).
Such a conventional matched filter solution for an FPGA uses a large portion of the programmable resources available on the FPGA, thereby making FPGA solutions relatively expensive. An apparatus and method that makes efficient use of FPGA resources and that is fast enough to support oversampling is therefore desirable.