Computers use two different representations for numbers. “Floating-point” representation (also called “double precision”) is used for variables that require an extremely wide range of numerical values. For instance, a variable that takes on a value of 10−308 at one time in a program, and then takes on a value of 10+308 at a later time would typically be represented in floating-point. “Fixed-point” representation is used for variables that take on values within a reasonably limited range. For instance, a variable that takes on only integer values between −10000 and 10000 would typically be represented in fixed-point. A special case of fixed-point representation is “integer”, which is used for variables that only take on values that have no fractional portion.
Integers are represented in computers as a sequence of bits (equivalent to a sequence of digits in human terms) with an implied decimal point (more correctly called “binary point”; both will be used interchangeably) at the right of the bits. The assumption hereinafter is that integers are represented in 2's complement notation, which is the most common representation. For example, the decimal number 13 is represented as 1101, which is interpreted from left to right as 1*23+1*22+0*21+1*20 (just as decimal 13 is interpreted as 1*101 +3* 100). Fixed-point representation differs from integer representation only in that the decimal point is not required to be at the right of the number. The decimal point may appear anywhere within the number, to the right of the number (with implicit zeros filling in the gap between the rightmost bit in the collection and the decimal point) or to the left of the number (again, with either implicit zeros or possibly ones (depending on signed considerations) filling in the gap between the leftmost bit in the sequence and the binary point). For example, fixed-point binary 11.01 is interpreted from left to right as 1*21+1*20+0*2−1+1*2−2 or 3.25 in decimal. The “scale” of a fixed-point number is the number of bit positions between the implied binary point of the equivalent integer and the binary point of the fixed-point number, with positions to the right being positive and positions to the left being negative. For instance, the scale of 11.01 (binary, equal to 3.25 decimal) is negative two (−2), because the equivalent integer is 1101 (binary, equal to 13 decimal) and the binary point in the fixed-point number is 2 positions to the left of its implied location in the equivalent integer. 11.01 is equal to 1101*2−2. An alternative way of viewing the value of a fixed-point number is that it is equal to the value of the equivalent integer times 2 raised to the scale of the number, and that integers are simply fixed-point numbers with a scale of 0. Floating-point numbers are represented in computers as “m*be”, where “m” (the mantissa) is a fixed-point number, “b” (the base) is almost always 2, and “e” (the exponent) is an integer.
In this context, the scale of a fixed-point number is the number of bits to the right of the binary point, and will be referred to as “right” hereafter. A related but slightly different property of a fixed-point number is an implicit “scaling factor” (also referred to as “scale” hereafter; the context will make the distinction clear). In many programming disciplines, the number that a user represents in a computer is very different from the number that represents the quantity that the user is actually measuring. For instance, numbers that represent a physical quantity (such as megahertz) may range from 1,000,000 to 10,000,000. While such numbers could be represented directly in fixed-point notation as numbers with 24 bits to the left of the decimal (and a scale of 0), a more typical representation would be to divide the numbers (which represent the physical quantity) by 224, giving numbers between 0.059 and 0.59. “Scaling” the physical quantities into numbers between 0 and 1 is often done, as that range simplifies many computations. Note that the scaling almost never results in numbers scattered exactly between 0 and 1; in this case, for instance, the range of 0.059 and 0.59 is used as the largest range that can be obtained by dividing by a power of 2 without exceeding the bounding range of 0 to 1. If 223 was used to scale the numbers, the range would be 0.108 to 1.08 which exceeds 1. A scale larger than 224 would reduce the range more than necessary. The numbers presented to the computer are the fixed-point numbers between 0.059 and 0.59 with a “scale” that depends on the precision the programmer desires. The programmer manually remembers (in his mind) that the true numbers are related to the represented numbers by a “scaling factor” of 24; that is, the true numbers are equal to the represented numbers multiplied by 2 raised to the scaling factor.
Scaling factor is a property of fixed-point numbers that, to the knowledge of this applicant (John R. Allen), has always been kept in the programmer's head (i.e. a mental step). To the applicant's knowledge, the scale has never been explicitly stored in a memory of a computer and linked to the number's fixed-point representation. As described below in the Detailed Description section, such storage of scale linked to (in any manner) or alternatively as a portion of fixed-point representation of a number provides several advantages in certain embodiments of the invention by the applicant.
Fixed-point numbers possess the capability of being “signed” or “unsigned”. Signed numbers can take on both positive and negative values; the leftmost bit (the “sign bit”) of the number is used to indicate whether the value is positive or negative. Unsigned numbers can only take on non-negative values, allowing an extra bit (the sign bit) to be applied to the number's value. Fixed-point numbers can also be characterized by their “size” (the total number of bits in the number), their “right” (the number of bits to the right of the decimal), and their “left” (the number of bits to the left of the decimal, excluding the sign bit for signed values). Fixed-point numbers (and floating-point numbers) can furthermore be characterized as being “real” or “complex”. A complex fixed-point number is one which has a non-zero imaginary component; a real fixed-point number is one where the imaginary component is known to be zero.
The property of fixed-point numbers as being “signed” or “unsigned” is hereinafter called the “signedness” property of the number. The property of a fixed-point number as being “real” or “complex” is hereinafter called the “complexness” property of the number.
Arithmetic of fixed-point numbers is similar to arithmetic of integers. Two fixed-point numbers that have the same scale can be added or subtracted just as if they were integers; the result has the same scale as the operands. Two fixed-point numbers with differing scales must be adjusted to have the same scale (much as floating-point numbers do) before they can be added or subtracted as integers. Any two fixed-point numbers can be multiplied using integer arithmetic; the scale of the result is equal to the sum of the scales of the operands. Similarly, any two fixed-point numbers can be divided using integer arithmetic; the scale of the result is equal to the scale of the dividend minus the scale of the divisor. Since fixed-point arithmetic is much simpler than floating-point arithmetic, computer applications where possible prefer to use fixed-point variables and arithmetic. Fixed-point arithmetic is faster, easier to implement in hardware, and requires less electrical power to execute.
Unlike floating-point arithmetic, fixed-point arithmetic has two common modes of execution, known as “saturation” and “modulo”. In saturation arithmetic (so named because values “saturate” at two boundaries: the largest value and the smallest value) values that exceed the largest representable value for a fixed-point representation are represented as the largest value and values that are smaller than the smallest representable value are represented as the smallest value.
As used herein, the term “largest representable value” is the largest number that can be represented in a given fixed-point representation and the term “smallest representable value” is the smallest number that can be represented in a given fixed-point representation. For example, the largest number that can be represented in a signed fixed-point number that contains 4 bits to the left of the binary point and 0 bits to the right of the binary point is 7. If the value 8 (obtained, say by adding 7 and 1) is converted into such a representation when saturation arithmetic is in effect, the resulting fixed-point number is 7, since that is the largest value for a signed 4-bit fixed-point number.
Under modulo arithmetic, values that exceed the largest representable value or are smaller than the smallest representable value are “wrapped” into the representable range of the fixed-point number by dropping the most significant bits. Modulo arithmetic wraps the representable numbers into a ring, so that running off the high end of the representable numbers moves back onto the low end. That is, increasing the largest representable number by 1 in the least significant bit results in the smallest (most negative number); similarly, subtracting 1 in the least significant bit from the smallest number results in the largest number.
If modulo arithmetic were in effect for the previous example of adding 7 and 1, the resulting 4-bit signed fixed-point number is −8, which is obtained by dropping any extra bits generated to the left. Similarly, the smallest number in a signed 4-bit fixed-point representation is −8. If the true value −9 is obtained from a subtraction of, say 1 from −8, while saturation arithmetic is in effect, the resulting fixed-point number is −8, the smallest representable number. If modulo arithmetic is in effect, the resulting fixed-point number is 7, obtained again by dropping any extra bits generated on the left.
A notion central to several higher level programming languges is the concept of the “type” of a variable or constant. A variable's type is a succinct representation of the amount of storage that a variable requires in memory, the operations that may legally be performed on the variable, and the other variables and constants with which the variable may be combined. Typical “primitive” variable types in various languages include integer, double (or double precision), complex, and char. “Primitive” types are so-named because they represent basic arithmetic or computational types.
“Derived” types, which are more complex combinations of primitive types, can be built from the basic primitive types. For instance, a “scalar” is typically a single element. A “vector” is a sequence of scalars collected together; individual elements are accessed and set by notation similar to, for instance, “A(i)” to access the i'th element of the vector A. An array is a collection of vectors; individual elements are accessed and set by notation similar to, for instance, “A(i,j)” to access the i'th element of the j'th vector of array A. In other words, scalars are 0 dimensional objects; vectors are 1 dimensional objects, and arrays are 2 (or more) dimensional objects. The “shape” of a type is the number of dimensions it has: 0 for scalars, 1 for vectors, and 2 or more for arrays.
Computer programs involve two distinct levels: a “specification level” and an “execution level”. The specification level is the representation that conveys the intent of the user, and is generally a higher level programming language. The execution level is the representation used by the computer to implement that intent.
While the execution level is often the instruction set of the computer, it may well be other things, such as another programming language. For instance, a program written in MATLAB™ may be translated into C for execution; in that case, MATLAB is the specification level and C is the execution level. Similarly, interpreters typically translate source programs to an intermediate representation, which they then use to drive execution. In such cases, the original source program is the specification level and the intermediate representation used by the interpreter is the execution level.
Note also that the specification level and the execution level can be the same language. For instance, a MATLAB source program may be translated to a different, optimized MATLAB source program. In such a case, the original source program is the specification level and the translated MATLAB program is the execution level. Given that all programs eventually execute on the instruction set of a computer, the assembly or machine level is also an execution level for all of the examples above.
Typically, computers have separate functional units and instructions for executing (a) fixed-point arithmetic and (b) floating-point arithmetic. Higher level mathematical constructs at the specification level, such as complex arithmetic or vector operands, are implemented at the execution level by combinations of scalar operations. For example, an addition of two complex fixed-point numbers is usually implemented by two separate fixed-point additions, one for the real portion and one for the imaginary portion. An addition of two 10-element vector floating-point operands is typically effected by using the floating-point functional unit 10 times, once for each element of the vector.
Because floating-point variables have a wide dynamic range whereas fixed-point variables have limited dynamic range, high-complexity algorithms such as, for example, signal processing algorithms, are typically developed using floating-point variables and floating-point arithmetic. Floating-point representation allows the user to delay the burden of dealing with practical implementation issues such as computational overflow. Once the algorithm has been proven correct in floating-point and the dynamic ranges of its variables are well-understood, the floating-point program is often manually converted into an equivalent fixed-point program, where variables are represented in fixed-point representation and arithmetic operations are performed using fixed-point instructions. Many computer architectures support only fixed-point operations; on those that support both fixed-point and floating-point, fixed-point operations execute much faster.
Correctly converting an initial floating-point program into an equivalent fixed-point program requires a thorough understanding of the dynamic values taken on by variables and expressions in the program. Gathering this information and using it to correctly convert a program is a difficult process, meaning that it is normally done manually. For example, The Mathworks provides a “quantize” function (see Chapter 6, pages 6.1 through 6.10 of “Filter Design Toolbox Users Guide”, June 2001) that must be applied when converting the following original code (in the MATLAB language):a=b+c+d+e+f;
The user needs to manually generate the following code:t=b+c;t=quantize(t, parms);t=t+d;t=quantize(t, parms);t=t+e;t=quantize(t, parms);. . .a=t;where “parms” is a parameter that indicates the desired precision of the fixed-point result.
Other examples of the recommended MATLAB quantization using the “quantize” function include the following example (taken from “Automatic RTL Conversion of DSP Algorithms for a Channelized Wideband Receiver” by Mike Groden, http://www.accelchip.com/copy/LNX paper.pdf, also published in International Signal Processing Conference, April, 2003, Dallas, Tex.)temp11r=quantize(qout , . . . (xinr1+xinr3)+(xinr2+xinr4));temp12r=quantize(qout , . . . (xinr5+xinr7)+(xinr6+xinr8));Xoutr1=quantize(qout, temp11r+temp12r);Xouti1=quantize(qout, 0);and this example (taken from “An Overview of the AccelFPGA Compiler for Mapping MATLAB Programs onto FPGAs”) by Prith Banerjee, http://bwrc.eecs.berkeley.edu/Seminars/Banerjee-11.6.02/MATCH-AccelFPGA-Berkeley.pdf) where the following MATLAB code
sum = 0;for k = 1:4mult = indatabuf(k) * coeff(k);sum = sum + mult;endis quantized as
% Converted Fixed Point MATLABqpath = quantizer(‘fixed’, ‘floor’, ‘wrap’, [8,0]);qresults = quantizer(‘fixed’, ‘floor’, ‘wrap’, [16,0]);sum = quantize( qresults, 0 );for k = quantize( qpath, 1:4 )mult = quantize( qresults, indatabuf(k) * . . .coeff(k) );sum = quantize( qresults, sum + mult );end
These examples illustrate the significant changes currently required to “quantize” a MATLAB program using the quantize function. Hereinafter, the phrase “quantize” will refer to the act of converting a floating-point program into an equivalent fixed-point program, selecting values for properties such as signedness, complexness, and precision so as to meet various requirements, unless the context specifically indicates the MATLAB function “quantize”.
Unfortunately, manual efforts in effecting changes such as addition of the word “quantize” to a line of a software program tend to be inefficient, inflexible, tedious, and error prone. Such efforts typically require so many modifications that the final program's appearance and the logical flow thereof are quite different from that of the original (i.e., floating-point) program. Moreover, a substantial investment in time is usually required to perform the conversion, resulting in both high manpower costs as well as substantial real-time delays.
At a very high level, computer languages can be divided into two categories: “statically-typed” languages and “dynamically-typed” languages. In statically-typed languages such as C and C++, programmers must specify the type (that is, whether the variable is fixed-point or floating-point, whether the variable represents a single number or a collection of numbers, etc.) of every variable in the program. In dynamically-typed languages such as MATLAB, programmers do not specify the types of variables. Instead, variables derive their types from the expressions assigned to them.
Other patents have dealt with the problem of converting floating-point programs written in statically-typed languages into fixed-point programs. See U.S. Pat. No. 6,460,177 granted to Lee and entitled “Method For Target-Specific Development Of Fixed-Point Algorithms Employing C++ Class Definitions” and also see U.S. Pat. No. 6,173,247 granted to Maurudis et al. entitled “Method and Apparatus for Accurately Modeling Digital Signal Processors”, both of which are incorporated by reference herein in their entirety. The applicant notes that Lee's approach involves changing variable declarations and header files (which implicitly changes variable declarations). As a result, his approach appears to be limited to statically-typed languages and will not work with dynamically-typed languages such as MATLAB, which do not contain declarations. Furthermore, Lee appears to be silent on what process should be effected with implicitly-typed quantities, such as constants. For example, how should the number “1” be interpreted in an expression “a+1” where “a” is a fixed-point variable? Should it be an integer value, a fixed-point value, a floating-point value, or something else? The applicant notes that this question appears to be left both unasked and unanswered in Lee's patent.
Note also that Lee describes range capture to identify some small number of statistics about variables' values, such as the maximum and minimum value taken on. Lee's range capture is apparently limited to summary information. Being summary, the range capture (by definition) is not all inclusive, and misses important information such as locations where overflows occur, how significant the overflows are, the original source of overflows versus overflows that are propagated, etc. For instance, if the range capture shows that a variable takes on values between −0.1 and 7.9, then a user can properly determine that a signed, left=3, right=12 representation will hold the values taken on by the variable. However, if “−0.1” minimum value is the only negative value, and that value occurs in only one insignificant computation, it may well be possible to represent the values using an unsigned representation, allowing an extra bit of precision on the right. In other words, an unsigned, left=3, right=13 representation may well work. The applicant notes that this fact cannot be determined by Lee's range capture.