The present invention relates to a system and a method for increasing throughput rate of a signal processor. In particular, it provides a system and method for parallel processing of a digital feedback signal using a high-speed gradient circuit such as a timing or gain gradient circuit, or both.
A xe2x80x9cdigital signalxe2x80x9d is a signal that conveys a discrete number of values at discrete times. Contrast the xe2x80x9canalog signal,xe2x80x9d i.e., a signal that conveys an infinite number of values on a time continuum. A signal having a digital form may be generated from an analog signal through sampling and quantizing the analog signal. Sampling an analog signal refers to xe2x80x9cchoppingxe2x80x9d the signal into discrete time periods and capturing an amplitude value from the signal in selected ones of those periods. The captured value becomes the value of the digital signal during that sample period. Such a captured value is referred to as a sample.
Quantizing refers to approximating a sample with a value that may be represented on a like digital signal. For example, a sample may lie between two values characterized upon the digital signal. The value nearest (in absolute value) to the sample may be used to represent the sample. Alternatively, the sample may be represented by the lower of the two values between which the sample lies. After quantization, a sample from an analog signal may be conveyed as a digital signal. This is the resultant signal upon which the digital circuit may operate.
A digital signal processor (DSP) transforms an input digital signal to an output digital signal. For the digital filter, the transformation involves filtering out undesired portions of the received digital signal. An original analog signal may be represented as a sum of a plurality of sinusoids. Each sinusoid oscillates at a particular and unique frequency. Filtering is used to remove certain frequencies from an input signal while leaving other frequencies intact.
Programs executing on digital circuits often do so in xe2x80x9creal-time.xe2x80x9d Real-time programs are programs that must execute within a certain time interval. Regardless of whether a program executes in a large period of time or a small period of time, the result of executing the program is the same. However, if real-time programs attempt to execute in an amount of time longer than the required time interval, then they no longer will compute the desired result.
Programs executing on a digital circuit are real-time programs in that the instructions are manipulating a sample of a digital signal during the interval preceding the receipt of the next sample. If the program cannot complete manipulating a sample before the next sample is provided, then the program will eventually begin to xe2x80x9closexe2x80x9d samples. A lost sample does not get processed, and therefore the output signal of the digital circuit no longer contains all of the information from the input signal provided to the digital circuit. This potential for losing samples is reduced by a preferred embodiment of the present invention, while maintaining a required throughput rate.
A digital circuit may be programmed to modify signals. The number of instructions required to do this is relatively fixed. A digital circuit must be capable of executing this relatively fixed number of instructions on any given sample before the next sample of the series is provided.
Besides considering a digital circuit""s throughput, all design parameters have an associated cost. One important cost factor is the silicon area needed to xe2x80x9chousexe2x80x9d the digital circuit. Those that are manufactured on a relatively small silicon chip are less expensive than those requiring a large chip. Therefore, an easily manufacturable, small (low cost) digital circuit is desirable.
Some features of digital circuits that are important to the design engineer include phase characteristics, stability, and coefficient quantization effects. To be addressed by the designer are concerns dealing with finite word length and circuit performance. order than a generic Nyquist filter to implement the required shape factor. DIGITAL FIR filters are subject to non-negligible inter-symbol interference (ISI), however.
Coefficient quantization error occurs as a result of the need to approximate the ideal coefficient for the xe2x80x9cfinite precisionxe2x80x9d processors used in real systems. Quantization error sources due to finite word length include:
a) input/output (I/O) quantization,
b) filter coefficient quantization,
c) uncorrelated roundoff (truncation) noise,
d) correlated roundoff (truncation) noise, and
e) dynamic range constraints.
Input noise associated with the analog-to-digital (A/D) conversion of continuous time input signals to discrete digital form and output noise associated with digital-to-analog conversion are inevitable in digital filters. Uncontrolled propagation of this noise is not inevitable, however.
Uncorrelated roundoff errors most often occur as a result of multiplication errors. For example, in attempting to maintain accuracy for signals that are multiplied, only a finite length can be stored and the remainder is truncated, resulting in xe2x80x9cmultiplicationxe2x80x9d noise being propagated. Obviously, any method that minimizes the number of multiplication steps will also reduce noise and increase inherent accuracy.
Correlated roundoff noise occurs when the products formed within a digital filter are truncated. These include the class of xe2x80x9coverflow oscillations.xe2x80x9d Overflows are caused by additions resulting in large amplitude oscillations. Correlated roundoff also causes xe2x80x9climit-cycle effectxe2x80x9d or small-amplitude oscillations. For systems with adequate coefficient word length and dynamic range, this latter problem is negligible. However, both overflow and limit-cycle effects force the digital filter into non-linear operation. Both of these latter constraints are addressed by a preferred embodiment of the present invention.
Constraints to dynamic range, such as scaling parameters, are used to prevent overflows and underflows of finite word length registers. For a digital circuit, an overflow of the output produces an error. If the input has a maximum amplitude of unity, then worst case output is:                               y          ⁡                      (            n            )                          =                                            ∑                              n                -                0                                            N                -                1                                      ⁢                          xe2x80x83                        ⁢                          x              ⁡                              (                n                )                                              =          s                                    (        1        )            
where:
s=scaling factor
x(n)=input sample value at n
y(n)=output sample value at n
Guaranteeing y(n) is a fraction means that either the circuit""s gain or the input has to be scaled down by xe2x80x9cs.xe2x80x9d Reducing gain implies scaling the digital filter""s coefficients, for example, to the point where a 16-bit coefficient, for example, would no longer be used efficiently. Another result of this scaling is to degrade frequency response due to high quantization errors. A better alternative is to scale the input signal. Although this results in a reduction in signal-to-noise ratio (SNR), the scaling factor used is normally  less than 2, not altering the SNR drastically. Systems employing circuits requiring use of reduced bandwidth are less susceptible to degradation of the SNR. This is also addressed by a preferred embodiment of the present invention.
A typical example of a high-speed digital circuit is a digital FIR filter with five or more coefficients known as a Type II FIR. A Type II FIR filter is based on an array of costly Multiply and Add (MAC) accumulation stages. A conventional system using MAC is constrained to a minimum number of gates to achieve a given partial product accuracy. Digital implementation of an FIR filter is also limited by the maximum number of logic gates that can be inserted between reclocking stages established by the filter""s clock cycle. Thus, for a given digital process, a minimum time to process is established by the propagation time through the critical path. To achieve very high speeds of processing, the critical path is broken into a number of shorter paths that can be addressed at higher clock speeds, i.e., processed within a short clock cycle. A preferred embodiment of the present invention implements an alternative using parallel processing including parallel processing of a de-interleaved signal in a feedback control circuit.
In magneto-resistive (MR) heads, with their inherent response nonlinearities, this throughput constraint is becoming even more unacceptable. There are more modern methods that achieve a fully digital solution, but these are extremely complex while covering a disproportionately large area on a silicon chip.
For those data streams that have a high dynamic range, a method involving splitting the sampled input signal into two portions and addressing each separately in separate filters has been proposed. Of course, this doubles the number of operations and the hardware required.
To reduce hardware complexity and computational intensity for relatively low-speed applications, such as modems, cascaded arrangements of data registers receive digitally encoded data and sequentially clock the samples. Each data register has a data capacity greater than twice the code width of a digitized sample, permitting each channel to store both I and Q data. Because the data capacity need be greater than twice the input, the data rate of devices with which this can be used is relatively low.
Some of the above introduce additional complexity not required in the preferred embodiments of the present invention while others may not be suitable for high-speed applications.
In a magnetic disk data storage system, for example, information is recorded by inducing a pattern of magnetic variations on the disk, thus encoding the information. The magnetic variations are recorded along concentric circular tracks on the disk. The linear density with which the magnetic flux changes may be recorded along a track as well as the radial density of tracks on the disk is ever increasing.
As the recording density is increased, however, the magnetic readback signal from the disk becomes more and more difficult to read and interpret due in part to inter-symbol interference (ISI). ISI results from process-time overlaps and the reduced spacing between neighboring magnetic flux patterns along an individual track as well as between those on adjacent tracks. For drives with interchangeable disks, in particular, each disk may introduce its own irregularities into the readback signal due to naturally occurring variations within manufacturing tolerances. Moreover, the irregularities are not uniform even over an individual disk, but depend to some degree on radial position.
Increased data density has prompted the use of digital signal processing techniques to extract data from noisy, distorted or otherwise irregular readback signals. In one commonly used technique, a sequence of consecutive raw data samples read from the disk is passed through a filter that continuously monitors the expected error in the signal and corrects data accordingly. A popular class for this purpose comprises the adaptive FIR filters.
These filters provide time-varying signal processing that adapts signal characteristics, in real time, to a sensed error measure. The characteristics are defined by time-varying coefficients, the values of which are adjusted at regular intervals, again in real time, in order to minimize cumulative error.
An adaptive FIR filter may be thought of as having two parts: a filter structure that uses coefficients to modify data, and an adaptation circuit that updates the values of the coefficients. Existing implementations of filter structures and adaptation circuits are subject to design compromises.
The dynamic power dissipated in conventional filter circuit implementations (assuming the use of CMOS ICs) is given by the relationship:
Pxe2x88x9dCxc3x97V2xc3x97fxc3x97NGatexe2x80x83xe2x80x83(2)
where:
C=the average loading capacitance of a gate in the IC chip,
V=the power supply voltage level,
f=the operating frequency, and
Ngats=the number of gates that are switching at frequency,f.
Improved performance is generally realized with a higher operating frequency,f, but comes at the expense of higher power dissipation levels.
From Eqn. (2), power consumption also increases in proportion to the number of gates. A common IC embodiment of FIR filters is a tapped delay line, in which each of the coefficients characterizing the filter corresponds to a separate xe2x80x9ctapxe2x80x9d along a delay line. The number of gates goes up in proportion to the number of taps. The number of taps dictates the overall time delay for data to pass through the filter and thus limits the operating frequency (data rate). To compensate for this delay, data pipelining is introduced to increase the FIR filter""s operating frequency and the effective system throughput. However, pipelining calls for more gates, resulting in even greater power consumption. This constraint is also addressed by a preferred embodiment of the present invention wherein taps are shared in parallel paths. This parallelism is not only evident in the FIR filter, for example, but also in synchronization circuits associated with the system, such as timing recovery circuits and AGC circuits.
In addition to the power demand, conventional FIR filter coefficient adaptation circuits, for example, can introduce a bottleneck. To provide updated filter coefficients in successive clock cycles as new data are clocked through, conventional adaptation circuits require computations to be performed within a single bit clock cycle of the input signal. This makes it difficult to increase the overall speed of the data detection system as a whole and limits the circuitry and algorithms that may be employed for updates. A preferred embodiment of the present invention addresses this xe2x80x9csingle bit clock of the input signalxe2x80x9d cycle constraint in all parts of the circuit, including feedback control.
Existing filter adaptation circuits also experience updated coefficients that wander from optimal when the coefficient adaptation process is operated simultaneously with a xe2x80x9cdecision-directedxe2x80x9d timing recovery loop. This prevents consistent convergence to optimal values and impedes the performance. A preferred embodiment of the present invention also addresses this constraint.
A xe2x80x9cpipeliningxe2x80x9d method is normally used to achieve better filter performance at high input data rates. The cost of using this method is increased latency, however. At very high speeds, such as are being seen with newer systems, conventional pipelining falls subject to the law of diminishing returns. The pipelining xe2x80x9coverheadxe2x80x9d now consumes a larger percentage of the benefits gained from higher clock speeds. The overhead consists of a required latching or reclocking stage for every pipelining command. Generally, the performance improvement for one level of pipelining is less than two while the xe2x80x9con-chipxe2x80x9d cost increase is greater than two. All the while this is occurring at the very high clock rate of the input data. A preferred embodiment of the present invention addresses the clock rate limitation imposed by a high data rate input signal, in particular during feedback control operations.
A preferred embodiment of the present invention provides a system and method for increasing the speed of operation of a digital circuit using a high-speed gradient circuit, such as a timing or gain gradient circuit. By providing parallel paths for operation, without appreciably increasing xe2x80x9con-chipxe2x80x9d real estate.
This allows the remaining portions of a functional circuit, such as a read channel circuit of a mass data storage device, to be upgraded since those xe2x80x9cprimaryxe2x80x9d portions no longer depend on xe2x80x9cslowxe2x80x9d timing recovery or AGC, for example.
Processing feedback data in parallel paths enables cutting clock speed in half, providing twice as much time for processing each bit in the timing recovery loop or AGC. Further it also enables a timing or gain gradient calculator, also processing in parallel paths, as described in U.S. patent application Ser. No. 09/256,568, Attorney""s Docket No. TI-28614 and incorporated herein by reference, to control the timing recovery circuit. By having each path of the parallel circuits operate at half the input data rate and providing for certain operations to be made common to each path, as described supra, required on-chip area is also reduced compared to conventional timing recovery circuits of comparable performance.
A preferred embodiment of the present invention is implemented for use by a timing recovery circuit by de-interleaving the digital output signal from a digital circuit into two separate bit streams, one containing the EVEN bits and the other containing ODD bits. (The terms ODD and EVEN are used to connote alternate bits and have no relation, except accidental, to either the position of the bits in a sequence or to any numeric value that may be assigned to the bits.) In a preferred embodiment of the present invention, the signal that is being processed within a timing recovery loop or an AGC circuit has been previously encoded in a partial response (PR) architecture for further processing in a maximum likelihood (ML) detector, such as a Viterbi Detector.
Referring to FIG. 2, a clock signal (not shown in FIG. 2) is provided from a timing recovery loop (not shown) to insure that the xe2x80x9csamplesxe2x80x9d are being taken at the appropriate instance for the chosen encoding format. A processing period of 2T, where T is the clock rate of input data signal (not shown), is made available by processing odd bits on the xe2x80x9crising edgexe2x80x9d of the clock signal along paths 101 and 101a FIG. 2. Of course, the opposite is the case for the even bits processed on the xe2x80x9cfalling edgexe2x80x9d of the clock signal at along paths 102 and 102a FIG. 2. The taps 103 and 104FIG. 2 can be configured using simple latches (not shown) and incorporate a multiply and accumulate (MAC) function for each tap. This alternating processing of even and odd bits on two different paths and at opposite edges of a clock signal provides the 2T processing period that differentiates a preferred embodiment of the present invention from existing designs.
Some of the salient advantages of the present invention are that it:
significantly increases throughput.
reduces required silicon area on the chip, considering the performance improvement.
reduces overhead.
reduces latency.
reduces fabrication cost.
uses a clock speed that is half the input data rate.
cross-references operations for each path.