Reference is made to the following patent applications which are filed on even date herewith and which are incorporated herein by reference in their entirety:
(1) U.S. application Ser. No. 09/259,031, filed Feb. 16, 1999, entitled xe2x80x9cDigital Channelizer Having Efficient Architecture For Window Presum Operation and Method of Operation Thereof;xe2x80x9d
(2) U.S. application Ser. No. 09/259,623, filed Feb. 26, 1999, entitled xe2x80x9cDigital Channelizer Having Efficient Architecture For Discrete Fourier Transformation and Operation Thereof;xe2x80x9d
(3) U.S. application Ser. No. 09/258,847, filed Feb. 26, 1999, entitled xe2x80x9cDigital Channelizer Having Efficient Architecture For Cyclic Shifting and Method of Operation Thereof;xe2x80x9d
(4) U.S. application Ser. No. 09/259,080, filed Feb. 26, 1999, entitled xe2x80x9cDigital Channelizer Having Efficient Architecture For Window Presum Using Distributed Arithmetic for Providing Window Presum Calculations in One Clock Cycle;xe2x80x9d and
(5) U.S. application Ser. No. 09/259,127, filed Feb. 26, 1999, entitled xe2x80x9cDigital Channelizer Having Efficient Architecture For Presum Discrete Fourier Transformation Selectively of Real or Complex Data and Method of Operation Thereof.xe2x80x9d
1. Field of the Invention
The present invention relates to filters for dividing an input bandwidth into a plurality of channels and more particularly, to a digital channelizer for satellite communication applications using discrete Fourier transformation (DFT) to divide the input bandwidth into channels.
2. Description of the Prior Art
Digital channelizers in satellite communication systems have several design constraints. High computation complexity is required which requires highly complex integrated circuit logic function and interconnections. High power consumption by integrated circuits can lead to high operating temperatures which could contribute to channelizer malfunction or failure. The system clock rate is required to be sufficiently high to support a high data throughput but should be as low as possible to lessen power consumption which contributes to the aforementioned possible high operating temperatures. The power consumption of a digital channelizer is proportional to the clock rate and the type of integrated circuits which implement the required high computational complexity. Furthermore, excess hardware can interfere with processing efficiency and be a source of potential malfunction.
FIG. 1 illustrates a block diagram of a prior art digital channelizer 10 which functions as a down converter and filter which divides a wideband input bandwidth into a plurality of equally spaced channels. The channelizer 10 is representative of channelizers using DFT which have been described in the literature. See Multirate Digital Signal Processing, published in 1983 by Prentice Hall, Englewood Cliffs, N.J., written by Crochiere and Rabiner, which publication is incorporated herein by reference in its entirety. Such systems have applications in wideband satellite communication systems.
The INPUT signal is applied to a bandpass filter 12 which passes a selected wide bandwidth for division into N equally spaced channels each of a narrower bandwidth. For example, a wideband signal of 320 MHz. may be passed by the bandpass filter 12 for division into sixteen 20 MHz. wide channels. The bandpass filtered signal is applied to analog to digital converter 14 which samples the bandpass filtered signal. A representative frequency spectrum resultant from sampling is described further below in conjunction with FIGS. 3A and 3B. Each sample is comprised of a multiple bit word. A serial stream of multiple bit words is outputted by the analog to digital converter 14 as an input to demultiplexer 16 which produces D outputs 18. The variable D may equal the variable M known as the decimation rate. The demultiplexer 16 functions as a multiple tapped delay line with each parallel output being outputted from a different tap of the delay line. The D outputs 18 of the demultiplexer are applied to window presum computer 20. The window presum computer 20 functions in a well-known manner to process the sequence of words within a window of L words by multiplying each corresponding word in a plurality of equal subparts R of the window containing N words by a window presum function coefficient and summing the resultant multiplication products to produce the sum of the multiplication products where R=L/N. The individual summed multiplication products, after further processing including DFT as described below, are outputted as individual ones of the N channels. The number of outputted channels may be selected to be less than N.
For example, a sequence of ninety-six words, outputted by the demultiplexer 16, is broken up into four subparts each containing twenty-four words. Each of R corresponding words, e.g. words 0, 24, 48 and 72, from a different subpart, are multiplied by their preassigned window presum function coefficient and summed to produce an output summation which is subsequently processed into one of the N output channels. The window presum computer 20 has D inputs and N outputs. The relationship between M, D and N, which is the DFT size and the number of possible channels, affects the architecture of the window presum computer 20. Words stored in a number of registers (not illustrated), e.g. words 0, 24, 48 and 72, equal to the R subparts in the window processed by the window presum computer 20, are summed after multiplication by their preassigned window presum function coefficient to produce the output summation.
The window presum computer 20 has been implemented by the Assignee with parallel data processing paths using integrated circuits with M not being equal to N. The number of processing paths I used by the Assignee to perform parallel data processing satisfies the relationship I equals the greatest common divisor of N and M which is expressed hereafter as GCD(N,M).
The N outputs from the window presum computer 20 are applied to a cyclic shift 24, which provides phase adjustment, for processing into each channel by DFT. The phase produced by the cyclic shift 24 is applied to the resultant N word outputs from the window presum computer by a calculated number of shifts. The operation of cyclic shifting is well known and is, for example, described in the aforementioned publication on pp. 320-323. The number of shifts of the output words of the window presum computer 20 by the cyclic shift 24 is determined by computing the value of the relationship mM*modulo N or xe2x88x92mM*modulo N. The variable m is an output index variable which ranges from zero upward to positive integers. The output from the cyclic shift 24, which has N channels, is applied to a discrete Fourier transform apparatus 26 having N inputs which transforms the output from the cyclic shift 24 into the N output channels.
FIG. 2 illustrates a conceptual block diagram of the window presum algorithm which represents the window presum processing performed by the system of FIG. 1. The sampled output of L individual words is shifted into a shift register which stores the sequential words outputted by the analog to digital converter 14. The input data are shifted into the shift register, which has a number of subparts R, e.g. 4 in the above example. The number of words per subpart (the DFT size) is equal to the number N of output channels. The shift register has an analysis window L words long which is R times the size N of the discrete Fourier transform. The sum of the individual R subparts contains the words which are further processed to individual channels by DFT. The data in the shift register are weighed with a time reverse window according to equation 7.70 on page 317 of the aforementioned publication to produce a windowed sequence as illustrated. The sequence is processed as blocks of samples starting at r=0 which are time aliased. The resultant summation is processed by a cyclic shift 24 through a number of shifts equal to mM*modulo N or xe2x88x92mm*modulo N and is applied to discrete Fourier transform 26.
The analog to digital converter 14 of FIG. 1, in accordance with digital sampling theory, produces a spectrum of frequency domain signals centered about zero frequency as illustrated in FIGS. 3A and 3B which respectively illustrate groups of twelve and twenty four frequency domain signals. The sampling frequency of fs for real signals produces counterpart frequency domain signals centered about the zero frequency extending to fs/2 in both the positive (real) and negative (conjugate) frequencies. The positive frequencies may be expressed mathematically as a=x+iy and the negative frequencies may be expressed as a=xxe2x88x92iy with corresponding positive and negative frequencies being conjugates of each other. Also, in accordance with digital sampling theory, the frequency domain signals of FIGS. 3A and 3B repeat periodically with a period fs for successively higher positive frequencies and successively lower negative frequencies. These upper repeating frequency domain signals have been omitted from the illustration. The counterpart frequency domain signals of FIG. 3A are 1 and 11, 2 and 10, 3 and 9, 4 and 8, and 5 and 7, and the counterpart channels of FIG. 3B are 1 and 23, 2 and 22, 3 and 21, 4 and 20, 5 and 19, 6 and 18, 7 and 17, 8 and 16, 9 and 15, 10 and 14 and 11 and 13. Frequency domain signals 0 and 6 in FIG. 3A and 0 and 12 in FIG. 3B do not have counterparts. The information of each frequency domain signal is transformed to its counterpart conjugate by a sign reversal of the imaginary term iy.
FIG. 4A illustrates a diagram of the window presum function of a window containing forty eight (L) real words having four (R) subparts each containing 12(N) real words and FIG. 4B illustrates a window presum function of a window containing twenty four complex words containing an imaginary part identified by the letter xe2x80x9cixe2x80x9d following a number and a real part identified by the letter xe2x80x9crxe2x80x9d following a number. The window presums are identical except that the window presum function of FIG. 4B has half as many words in view of each word having a real and a imaginary part. Complex data in FIG. 4B is, for example, obtained when conversion of a spread spectrum transmission is down converted in the tuner of the receiver. The individual words of FIGS. 4A and 4B are multiplied by their preassigned window presum function coefficients and subsequently summed with other products of corresponding words from other subparts to produce the output of the window presum computation which is subsequently processed into N channels by DFT.
Corresponding words in each subpart R are summed to produce a number of sums equal to the number of words per window, e.g. P0-P11 or P0r-P5i. The summations P0-P11 and P0r-P5i; are processed with the window multiplication process by the use of stored coefficients to compute a value of y for each of the N channels which represents the summation of products which is applied to the cyclic shift 24.
The summation process, when the rate of decimation M is equal to the number of channels N, may be implemented efficiently with an array of registers storing the individual words identified in the vertical columns of FIG. 4A to produce the outputs P0-P11.
The summation of the products of corresponding individual words (e.g. 0, 12, 24 and 36 in FIG. 4A or word parts 0r, 6r, 12r and 18r in FIG. 4B) times their preassigned window presum function coefficients may be implemented in a number of ways. One method is illustrated in FIG. 5 which has the disadvantage of using substantial hardware requiring multipliers 40 and summation calculation 42. The number of multipliers 40 is equal to R and the number of adders in summation calculation 42 is equal to (Rxe2x88x921) in the worst case. This method computes the output summation y in one clock cycle (pipelined). The word values x0, x1, x2 and x3 represent corresponding word values from each of the R subparts of the window which is L words in length, e.g. words 0, 12, 24 and 36 in FIG. 4A or real word parts 0r, 6r, 12r and 18r in FIG. 4B. While this implementation for computing the summation y is computationally fast, it has the disadvantage of requiring a substantial number of gates, other hardware and interconnections which have the disadvantages described above especially in an environment involving satellites.
FIG. 6 illustrates a block diagram of a finite impulse response filter proposed in xe2x80x9c30Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review,xe2x80x9d by Stanley A. White, in IEEE ASSP Magazine, July 1989, pp. 1-19. The illustrated filter computes the summation y with an input of four eight bit words x0, x1, x2 and x3 in a serial fashion requiring eight clock cycles to process eight bit words. The overall operation is to compute individual products of input words x0, x1, x2 and x3 and their multiplying window coefficients W0, W1, W2, and W3 to generate a sum xcexa3ixiWi. The individual products are not computed in isolation and then added. Each bit of every word determines whether to add or subtract a multiple of their respective window coefficients, and all the bits at the same position within the words are processed at the same time. The result is that the overall sum of products is generated not by simply summing up products, but by summing multiples of different sum combinations of window coefficients and their negations. The bit select 50 selects bit slices from the eight bit words, e.g the bits of the least significant bit LSB in ascending order to the most significant bit MSB. The logic circuit 52 exploits symmetry in the DA ROM 54 to eliminate half of the values that need to be stored for distributed arithmetic to operate correctly as a process. The process of reducing the number of bits is described on pages 5 et seq. in the aforementioned paper, The DA (distributed arithmetic) ROM 54 stores all the possible sum combinations of window coefficients and their negations. The bit slice from the input words act as the address into the DA ROM 54 to choose the proper sum of coefficients. The shifter 56 outputs the proper multiple by a power of two of the DA ROM output to the adder 58. The output of shifter 56 is applied to a summation calculation 58 which sums multiples of different sum combinations of window coefficients and their negations. Feedback from register 60 provides the current sum which is summed with the new sum by summation calculation 58 for each successive bit slice.
The serial implementation of FIG. 6 suffers from the disadvantage of requiring a high clock rate to compute the summation y for large data words. A high clock rate in satellite applications requires high energy consumption which can cause heating in integrated circuits and for systems requiring high data rates, represents a potential speed processing barrier. Processing one bit at a time has undesirable latency.
FIG. 7 illustrates an implementation of distributed arithmetic used by the Assignee in the window presum computer 20 to sum the products of the corresponding words and their preassigned window presum function coefficient. This system performs processing similar to FIG. 6, except that three clock cycles are used to respectively process four bit nibbles, which are inputted from twelve bit words x0, x1, x2 and x3. The processing of the four bit nibbles is in parallel but otherwise is analogous to FIG. 5. The summation calculation 62 sums for each of the three clock cycles the outputs from the shifters 56. Register 64 stores the resultant summation outputted by summation calculation 62 and feeds the summation back to the summation calculation to sum the current summation with the summation of the next clock cycle processing.
This implementation of distributed arithmetic has disadvantages for high word processing throughputs. It requires a higher clock rate to process the four bit nibbles in three cycles for each word which increases energy consumption when compared to processing all the bits of the word in one cycle. The clock rate required to perform three processing cycles per word could, for certain satellite processing applications, limit the word processing throughput below that which is required for a desired system performance.
Window presum computers 20 are well known which utilize an array of registers to store words x0, x1, x2 and x3 processed in accordance with the aforementioned processes for computing the summation y. Some applications have their decimation rate M equal to the DFT size N and the number of channels which are outputted. However, the Assignee has implemented a window presum calculator 20 having the decimation rate M not equal to the DFT size N. These systems provide the corresponding input words from each of the aforementioned subparts of the window, e.g. words 0, 12, 24 and 36 from FIG. 4A or corresponding parts of words from FIG. 4B from storage in registers for multiplication and summation to produce the output y of the window presum computer for each of the N channels.
Discrete Fourier transforms are well known. A stand alone discrete Fourier transform apparatus of N inputs provides a frequency response at N outputs at specific equidistant frequencies. The N inputs are time domain signals and the N outputs are frequency domain signals determined at singular frequencies. The discrete Fourier transform apparatus itself can extract channel information, although it samples at only a singular frequency that represents the channel information.
In a channelizer, an output does provide a single extracted channel. The N outputs correspond to N time domain signals that each contain information from one of N frequency bands (equal in bandwidth) that divide the input signal frequency spectrum. This does not exclude the input signal from having more or fewer channels than N. The spectrum is simply divided into N frequency bands. All N inputs are required in the computation for every one of the N outputs and each of the N outputs represents the time-domain signal of one of the frequency bands dividing the frequency spectrum.
FIG. 8 is a diagram representing a prior art discrete Fourier transform device 118 which converts the output of twelve cyclically shifted summations y, produced by the cyclic shift 24, into twelve frequency domain outputs. Various algorithms are known for computing a DFT. The Winograd algorithm used for non-power of two discrete Fourier transforms is used in FIG. 8. The DFT apparatus 118 has twelve time domain inputs xe2x80x9cin 0-in 11xe2x80x9d each representing multiple bit words and twelve frequency domain outputs xe2x80x9cout 0-out 11xe2x80x9d each representing multiple bit channel outputs 0-11. Some of the outputs, which are represented in FIG. 8 as real numbers, in fact are complex numbers containing a real term and an imaginary term. The illustration of complex input words has been omitted in order to simplify illustration.
The discrete Fourier transform apparatus 118 includes an input discrete Fourier transform computation stage 120 comprised of six two point DFT""s 122 of known construction each having a pair of time domain inputs and a pair of frequency domain outputs, an intermediate discrete Fourier transform computation stage 124 comprised of four three point DFTs 126 of known construction each having three inputs and three outputs and an output discrete Fourier transform computation stage 128 comprised of six two point DFTs 130 of known construction each having two inputs and producing two frequency domain outputs. The outputs of the two point DFTs 122 are inputs to the individual DFTs 126 of the intermediate discrete Fourier transform computation stage 124 and the outputs of the three point DFTs 126 of the intermediate discrete Fourier transform computation stage are inputs to the individual discrete Fourier transforms 130 of the output discrete Fourier transform computation stage 128.
FIG. 9 illustrates a prior art pruned discrete Fourier transform apparatus 140 which is representative of modifications performed by the Assignee to eliminate unnecessary DFTs 130 in the output discrete Fourier transform computation stage 128 when all of the N possible frequency domain outputs, equal to the number of time domain inputs, are not needed for further processing. The discrete Fourier transform 118 of FIG. 8 has been modified in FIG. 9 to eliminate two output stages 130 in view of only frequency domain signals 2, 3, 4, 5 and 8, 9, 10 and 11 representative of the frequency domain signals of FIG. 3A being required for further processing. The two point discrete Fourier transforms 130 which produce frequency domain signals 0 and 6 and 1 and 7 have been eliminated. The pairs of frequency domain inputs 0 and 6 and 1 and 7 are not represented. Output 5 is a counterpart and conjugate of output 7. However, output 5 in FIG. 9 is not further processed downstream in place of output 7.
FIG. 10 illustrates another form of prior art discrete Fourier apparatus 300 having all possible frequency domain signals as outputs. The frequency domain input discrete Fourier computation stage 302 has three four point discrete Fourier transforms 304 of known construction which each have four time domain inputs and four outputs which respectively are coupled to an output discrete Fourier transform computation stage 304 having four three point DFTs 306. This structure does not have an intermediate discrete Fourier computation stage like FIGS. 8 and 9.
FIG. 11 is a diagram representing a prior art discrete Fourier transform apparatus 400 which is twenty four time domain inputs and all of the possible twenty four frequency domain outputs as outputs. The discrete Fourier transform has an input discrete Fourier computation stage 402 comprised of three eight point preweaves 404, three intermediate discrete Fourier computation stages, 410, 412 and 414, respectively comprised of eight three point preweaves 416, a multiply stage and eight three point postweaves 418 and an output discrete Fourier computation stage 419 comprised of three eight point postweaves 420.
The present invention is a digital channelizer and a method which divides an input bandwidth, such as the wideband signal which is received by a satellite, into at least some of N possible channels.
A digital channelizer in accordance with the invention has an efficient architecture, organization and movement of data from the window presum through the discrete Fourier transform device. The decimation rate M is not limited to being equal to N. The use of distributed arithmetic reduces hardware required for the window presum operation compared to the prior art of FIG. 5. An efficient layout of window presum computations permits efficient cyclic shifts, which map directly into the discrete Fourier transforms. The output discrete Fourier computation stage may be simplified when not all channels are required as outputs.
The digital channelizer includes a window presum computer having a modular processing architecture which transfers words stored within memory elements, which in the preferred form of the invention is a connected array of registers, in a systematic and periodic pattern to complete window presum computations during a single clock cycle. Minimizing the number of operations which must be performed to complete the window presum operation reduces power consumption and permits the system to operate at higher throughputs. Furthermore, a modular implementation of a window presum computer as parallel window presum circuits simplifies the memory structure of the registers in the window presum by permitting the same register array within an integrated circuit to be used for each of the parallel processing paths.
The window presum operation is performed in parallel in modular window presum circuits which efficiently map into the cyclic shift and discrete Fourier transform device which are also implemented as a modular architecture. The number of parallel paths, which is equal to the number of window presum circuits, may be determined by the value of the GCD(N,M). When the number of window presum circuits is determined by the value of GCD(N,M), hardware use is reduced in view of the processings from the window presum computations, cyclic shifting and DFT being efficiently mapped into a minimum amount of hardware. The processing speed of each of the integrated circuits within the parallel processing paths is at a lower clock rate. Parallel processing within the window presum computer, cyclic shift and discrete Fourier transform apparatus permits slower, but more power efficient integrated circuit technologies, such as CMOS, to be used to perform the required operations. Slower parallel operations lessen the generation of heat caused by high clock rates.
The window presum computer processes a block of words produced by analog to digital conversion which have a data length L. The data sequence is windowed by multiplying each word by the preassigned window presum function coefficient which is chosen to provide the filter requirement as, for example, illustrated in FIGS. 4A and 4B. Each of R individual subparts of the window, which are N words long, are processed word by word to provide products of the word value times the preassigned window presum function coefficient. While the invention is not limited thereto, the preferred form of summing the individual products of the words times the window presum function coefficients is with distributed arithmetic which calculates the summation of the products of words and their preassigned window presum coefficients within a single clock cycle.
A window presum in accordance with the invention is responsive to parallel data streams of words which are used to produce N outputs for every M input words, which are subsequently processed into N channels such that each channel data rate has been decimated by a factor of M from the original. The N outputs each are a function of a window function and a function of a plurality of inputs to the window presum.
A digital channelizer which divides an input bandwidth into at least some of N channels in accordance with the invention includes an analog to digital converter which encodes the input bandwidth into a serial digital data stream of data words; a demultiplexer, coupled to the analog to digital converter, which divides the serial digital data stream into parallel data streams of data words; a window presum having N outputs, coupled to the parallel data streams, each output being a function of a window presum function and data words from a plurality of the parallel data streams; a cyclic shift, coupled to I output groups of data words, having I cyclic shift paths, each cyclic shift path being responsive to a different output group of data words to produce I output groups of data words which are shifted within the cyclic shift, each cyclic shift path comprising word shifting elements responsive to a group of data words with I equalling GCD(N,M) and each output group of cyclically shifted data words is repeatedly shifted through a number of cycles equal to N/GCD(N,M) with each cycle having a shift value defined by mM*modulo N, or xe2x88x92mM*modulo N wherein individual channels of the at least some of the N channels are decimated by a decimation factor of M and m is an index variable ranging from zero upward to positive integers; and a discrete Fourier transform apparatus having N inputs, coupled to the N outputs of the window presum, which performs a discrete Fourier transform on the N inputs to produce an output of at least some of the N channels.
A process of dividing an input bandwidth into at least some of N channels in accordance with the Invention Includes providing I input groups of data words to a window presum having I word processing paths with each group having a plurality of data words; processing each input group of data words in one of the I word processing paths within the window presum to produce a window presum having N outputs with each output being a function of a window presum function and a plurality of data words from one input group of data words and providing I output groups of data words; inputting the I output groups of data words to a cyclic shift having I cyclic shift paths; cyclic shifting each inputted group of I output groups of data words in a different cyclic shift path within the cyclic shift to produce I output groups of shifted data words, each cyclic shift path comprising a plurality of word shifting elements, each of the word shifting elements in a cyclic shift path being responsive to group of data words and outputting a data word which is one output data word of a group of data words outputted by the cyclic shift path containing the word shifting element; and inputting the I output group of shifted data words into a discrete Fourier transform and transforming the inputted I group of shifted data words to produce at least some of the N channels; and wherein I equals GCD(N,M) and each output group of cyclically shifted data words is repeatedly shifted through a number of cycles equal to N/GCD(N,M) with each cycle having a shift value defined by mM*Modulo N or =mM*Modulo N and wherein individual channels of the at least some of the N channels are decimated by a decimation factor of M and m is an index variable ranging from zero upward to positive integers.
A digital channelizer which divides an input bandwidth into at least some of N channels in accordance with the invention includes an analog to digital converter which encodes the input bandwidth into a serial digital data stream of data words; a demultiplexer, coupled to the analog to digital converter, which divides the serial digital data stream into M parallel data streams of data words; a window presum having N outputs, coupled to the parallel data streams, each output being a function of a window presum function and data words from a plurality of the parallel data streams and being real or complex data; a cyclic shift, coupled to the real or complex data outputted from the window presum, which outputs real or complex data which is phase shifted relative to data outputted from the window presum; a discrete Fourier transform apparatus, coupled to the outputs of the window presum, which performs a discrete Fourier transform on the inputs to produce an output of at least some of the N channels; and wherein the discrete Fourier transform apparatus, is coupled to the cyclic shifted real or complex data, and in response to a command performs a discrete Fourier transform on inputted cyclic shifted real or inputted cyclic shifted complex data to produce the channels, the discrete Fourier transform apparatus performs a transformation of the inputted cyclic shifted real data when the command specifies processing of the cyclic shifted real data and performing a transformation of the inputted cyclic shifted complex data when the command specifies processing of the inputted cyclic shifted complex data; and wherein the discrete Fourier transform apparatus comprises an N point discrete Fourier transform including an input discrete Fourier computation stage having two N/2 point discrete Fourier transforms having inputs coupled to outputs of the cyclic shift and a plurality of outputs and an output discrete Fourier computation stage having N/2 two point discrete Fourier transforms having a plurality of inputs coupled to outputs of different ones of the two N/2 point discrete Fourier transforms of the input discrete Fourier computation stage and a plurality of outputs which are different ones of the channels.
It should be understood that the invention is not limited to its elements as summarized above.