Most signals of practical interest such as speech, radar, sonar, communication, audio, and video are analog. To process analog signals by digital means it is necessary to convert them into digital form, i.e., into a sequence of numbers having finite precision. A fixed-point number representation is a number that has a fixed number of digits before and after the radix point. The term radix point is similar to the decimal point but it is a generalized term to refer to any numbering base such as binary, octal, hexadecimal, etc. Fixed-point number representation is simpler compared to the more complicated and more computationally intensive floating-point number representation.
Fixed-point numbers are useful for representing integer as well as fractional values, usually in base-2 (binary) for digital implementation. The maximum value of a fixed-point number is simply the largest value that can be represented by the number of bits used to express the fixed-point number. If a 2's complement signed representation is used, the largest positive value that can be represented by an N-bit number is 2N-1−1 and the largest negative value that can be represented by an N-bit number is −2N-1. Note that the above explanation considered the N-bit number to be an integer. If some of the bits are used for a representing fractional part of the value then the largest value that can be represented will be smaller in magnitude but it will have additional precision corresponding to the fractional bits. Specifically, let N=NI+NF where NI represents the number of bits used for integer part and NF represents the number of bits used for representing fractional part. With this representation, the largest positive number that can be expressed is 2N1−1−2−NF and the largest negative number that can be expressed is −2NI.
In fixed-point arithmetic when numbers are added, subtracted, multiplied, divided, or in general manipulated as part of the processing, the result of such processing may have a larger value than the largest values of the input operands. This increased value of the number may require additional bits for representation and storage. In some cases, it may be acceptable to increase the number of bits to represent the output of a processing stage. In other cases, it may be desirable to limit the largest positive or negative value that the output of a processing stage may take. If a signal is limited to a certain number of bits and not allowed to take the full value it may have taken without the limiting, the signal may be distorted. The process of limiting the value of a signal to a certain number of bits is referred herein as saturation. When a signal is saturated, distortion may be caused. In some applications, it may be desirable to accept some amount of saturation and its concomitant distortion in order to limit the bit-width and to reduce the complexity of the processing. Saturation is different from other commonly used numerical approximations such as rounding and truncation where the additional precision of a signal is discarded. The rounding and truncation operations are applicable to every single value of a signal whereas saturation is effective only when the value of a signal exceeds the maximum or minimum value that can be represented within a chosen numerical precision.
Consider a signal s, which is represented by bit-width of N bits. Furthermore, let the signal s be represented in 2's complement format where the Most Significant Bit (MSB) is a sign bit. Let the value of the signal s be expressed by a number which is represented by a sequence of bits bN-1, bN-2, bN-3, . . . , b2, b1, b0. Let M denote the bit-width required after saturating the signal s.
Normally a comparator may be required to implement the limiting function. A comparator may compare the largest value that can be represented by an M-bit number with the input N-bit input signal. If the value of the N-bit input signal s is larger than the largest positive number that can be expressed by an M-bit representation, then the largest positive M-bit value is output. Similarly, if the value of the N-bit input signal s is smaller than the largest negative number that can be expressed by an M-bit representation, then the largest negative M-bit value is output. Otherwise, the original value of the input N-bit number is output but in M-bit representation. For example, consider the case where a signal represented with 8-bits is to be saturated to 5-bits, both in 2's complement format. The maximum positive and maximum negative values for a 5-bit 2's complement number are +15 and −16 respectively. If the 8-bit input signal value is, for example, 123, it will be saturated to +15. If the 8-bit input signal value is, for example, −115, it will be saturated to −16. If the 8-bit input signal value is, for example, 13, it will be left as is, i.e., the output of the saturation logic will be 13. If the 8-bit input signal value is, for example, −14, it will be left as is, i.e., the output of the saturation logic will be −14.
Let MAX_VAL denote the largest positive value that can be represented by an M-bit binary number in 2's complement format. Let MIN_VAL denote the largest negative value that can be represented by an M-bit binary number in 2's complement format. The limiting operation is illustrated in pseudo code below for getting a saturated output signal r from an input signal s.
If (s > 0)// positive number? { If (s > MAX_VAL)  r = MAX_VAL Else  r = s}Else// negative number{ If (s < MIN_VAL)  r = MIN_VAL Else  r = s}
In hardware, the logic illustrated in the pseudo code may be implemented as shown in FIG. 1. The N-bit input signal s is compared with the MAX_VAL and MIN_VAL values in parallel using the N-bit comparators C1 and C2 respectively. The results of the comparison are used to select between the N-bit input signal s and the MAX_VAL in the N-bit multiplexer M1 and between input signal s and the MIN_VAL in the N-bit multiplexer M2. Finally, the selection between the output of multiplexers M1 and M2 is done in the M-bit multiplexer M3 using the sign bit of the input signal as the select line. In this example implementation, there are two N-bit comparators, two N-bit multiplexers and one M-bit multiplexer. The logic depth is three stages as the final output appears after the input goes three stages of logic units.
Performing saturation is a required operation in many digital signal processing circuits to keep the bit-width of the signals from continuing to grow as the signals go through multiple stages of processing. In some cases, the circuitry required to perform saturation may be more complex than the actual arithmetic being implemented in a processing block. For example, when adding two N-bit numbers, the output bit-width may be N+1 bits. If the output is to be limited to N-bits, a saturation circuit similar to the one shown in FIG. may be required. In this case, the addition operation requires a single N-bit adder but the saturation circuit requires two comparators and three multiplexers. The added complexity of the saturation circuit may also increase the power consumption of the circuit. Therefore, a more efficient technique is required for implementing saturation logic. A method and apparatus are disclosed that enable a fast and hardware efficient saturation circuit. This may lead to reduced power consumption and reduced silicon area.