The present invention relates to converting from decimal numbers to binary numbers and vice versa, particularly for use in computing systems and such like.
Human mathematics is largely based on the decimal system, presumably due to the biological fact (or accident) that we have five fingers on each of two hands. In contrast, for physical reasons, modern digital computer systems use binary numbers for their internal processing, with a one represented by the presence of a signal, and zero by the absence of a signal (or vice versa). Consequently most numerical input data will be presented to a computer in decimal form. This must then be converted to binary for processing, and then converted back to decimal for output.
There are several known methods of representing decimal numbers as binary numbers. If we simply are considering integers, then clearly there is a direct mapping from a decimal integer to its corresponding binary integer (for example 235 goes to 11101011). Moving on to real numbers, possibly the most natural approach is to introduce a xe2x80x9cdecimalxe2x80x9d point into a binary number, so that the first digit to the right of the xe2x80x9cdecimalxe2x80x9d point represents xc2xd, and the nth digit xc2xdn. One problem with this approach is that 0.1 (in decimal) is a recurring fraction in binary, and so cannot be represented exactly. This leads to the possibility of rounding errors, which can be exacerbated by repeated conversions from decimal to binary and back again.
A slightly different approach is to extend the integer representation to cope with real numbers by separately recording a decimal point position using an integer exponent (this can be regarded as floating point real numbers rather than fixed point). Thus 235 could be 23.5, 2.35, 0.235 etc according to the value of the exponent. This overcomes the recurring fraction problem mentioned above, but is still not completely satisfactory. The conversion between binary and decimal is not particularly simple in terms of hardware design (this sort of facility is nearly always provided in hardware for speed). The reason for this in general terms is that such conversion requires binary addition rather than simple bit manipulation. Also, it is relatively common to have to truncate or round decimal numbers for output, and there is no quick way of performing such truncation or rounding with this coding technique (ie without first performing a conversion of a full number). For example 01111111, as representing 1.27 with a suitable exponent, rounds to 1.3, but 01111100, representing 1.24, rounds to 1.2.
One way around these problems is to use binary coded decimal. Here each decimal digit (0-9) is encoded by a corresponding four bit binary number, so that a sequence of decimal digits can then be represented by concatenating the four-bit binary numbers. This allows quick retrieval of any given decimal digit independently from the others, and so provides for simple rounding and truncation. However, this has been achieved at the expense of a significant loss of storage efficiency. Thus potentially four bits can store 0-15, rather than 0-9.
We can formally measure storage efficiency by observing that the number of digits Nr needed to store an integer X using base r goes as Nr|logr(X). Asymptotically therefore (ie for very large numbers) the ratio of digits used for binary storage of a given number to decimal storage of that number is:
Since binary coded decimal has a ratio of 4 between the number of binary digits and the number of decimal digits, this gives an efficiency of only 3.322/4=0.83. Expressed another way, a 32 bit system, which in theory could store 4,294,967,295 as a binary integer, can now only store 99,999,999 as a decimal integer, a reduction in terms of maximum number of a factor |40. In order to overcome this deficiency, a sort of hybrid algorithm was developed by Chen and Ho (see Communications of the ACM, January 1975, v18/1, p49-52). This is based on the fact that 210=1024|103 (this correspondence of course underlies all computer storage sizingsxe2x80x94thus 8 kBytes really corresponds to 8*1024 Bytes rather than 8000 Bytes). Accordingly, the Chen-Ho algorithm takes three decimal digits, and then encodes them into 10 binary bits. This allows relatively quick retrieval of the decimal digits in any given location, by simply decoding the corresponding 10 binary bits, but retains a high (asymptotic maximum) storage efficiency of 3.322/(10/3)=99.66%.
The Chen-Ho algorithm is best understood as a mapping from a decimal coded binary number having 12 bits, corresponding to 3 decimal digits each represented by 4 binary bits, into a 10 bit binary number. Thus the input to the algorithm can be written as (abcd), (efgh), (ijkl), where each bracketed group of four letters corresponds to the decimal coded binary representation of the equivalent decimal digit. The 10 output bits can then be denoted as pqrstuvwxy. The details of the mapping from the input string to the output string are set out in Table 1 below.
The Chen-Ho algorithm is based on the observation that decimal digits 0-7 can be encoded in 3 binary bits; these decimal digits are therefore considered as xe2x80x9csmallxe2x80x9d, whilst decimal digits 8 and 9 are considered as xe2x80x9clargexe2x80x9d. If a decimal digit is small, then its first bit (when represented in binary) is zero; if it is large, then its first bit is one, but the next two bits are always zero (since individual numbers above 9 are not required in the decimal system). Thus the first bit for each binary coded decimal determines whether the number is small (zero) or large (one), and is known as the indicator bit. For three decimal digits there are a total of 23=8 possibilities as to the values of the three indicator bits, and these correspond to the 8 lines in Table 1. In other words, which line of the table is used to perform the mapping depends on the values of the indicator bits. If all three decimal digits are small (51.2% of possibilities), corresponding to the first line in the table, then this is indicated by a 0 in the p output position. The remaining 9 bits are then divided into three groups of 3, which indicate in standard fashion the value of each of the decimal digits (these must all be in the range 0-7, since all of the decimal digits are small). If just one of the decimal digits is large (38.4% of possibilities), then p is set to 1. The next two bits (q and r) are then set to 00, 01, or 10, to indicate which of the three decimal digits is large (the first, second or third respectively). The remaining 7 bits are split up into two groups of 3, to encode the two small digits, plus 1 additional bit to encode the large digit. (It will be appreciated that since there are only two possibilities for a large decimal, 8 or 9, then the actual value can be represented by a single bit, corresponding to the final bit in the binary coded digitalxe2x80x94ie d, h or l as appropriate).
If two of the decimal digits are large (9.6% of possibilities), then p, q, and r are all set to 1. A further two bits (t and u) are then set to 00, 01, 10 to indicate the position of the one small decimal digit (first, second or third digit). This leaves a final 5 bits, of which 3 are used to encode the one small digit, and each of the remaining 2 bits is assigned to one of the two large digits.
Finally, if all three of the decimal digits are large (0.8% of possibilities) then p, q, r, t and u are all set to 1. Of the remaining 5 bits, three are used to encode one each of the three large digits, and the remaining two (w and x) are set to zero.
An appropriate decoding algorithm for going back from Chen-Ho to binary coded decimal can be obtained in a straightforward manner effectively by inverting the contents of Table 1 and is given in their paper. Note that for this purpose, it is first necessary to determine which line of the table was used for the encoding. This can be ascertained by looking initially at the value of the p bit; and if this is 1, at the values of the q and r bits; and if these are both 1 as well, at the values of the t and u bits. Thus the p bit can be regarded as the primary indicator field, and q, r, t and u as a secondary indicator field.
Chen and Ho observe that their indicator field can be regarded as a form of Huffman coding, in that it has a variable length which is shortest for the most likely data values, with less probable values having increasingly greater length. Moreover, the values of the indicator field are carefully selected so that they can always be properly distinguished from one another. However, overall Chen-Ho is not a form of Huffman compression, in that it works by eliminating the inoperative bit combinations of binary coded decimal. (In contrast Huffman compression assigns a short code to common symbols, and a longer code to rare symbols, to derive an overall benefit in terms of the statistical average code length).
An important advantage of the Chen-Ho algorithm is that there are only binary shifts, insertions and deletions, based on the testing of the 3 indicator bits (a, e and i). Moreover, it will be appreciated that the bit assignments outside the indicator fields have been designed to simplify the bit mapping as much as possible. Thus bits d, h and l (the least significant bits in each input decimal digit) always map straight through to bits s, v and y respectively. Overall therefore, a very fast efficient hardware implementation of the coding is possible.
Note that certain other schemes for encoding 3 decimal digits into 10 binary digits are also known in the art. Thus U.S. Pat. No. 3,618,047 describes a coding system, based on very similar principles to Chen-Ho. It is also pointed out by Smith (see Communications of the ACM, August 1975, v18/8, p463) that it would be possible to simply treat each three decimal digits as an integer in the range 0-999, and then convert to the corresponding binary integer. This makes the conversion from decimal to binary and vice versa conceptually very simple. However, the operations involved in the conversion from decimal coded binary into compressed binary and back again are significantly more complex than that required for Chen-Ho. The above-mentioned two papers, by Chen-Ho and Smith, also discuss the possibility of variable length coding, in other words where N decimal digits are encoded to a variable length binary string. However such schemes are difficult to incorporate into standard computer systems, and will not be discussed further herein. One drawback with the Chen-Ho algorithm described above is that it only works in relation to converting groups of three decimal digits into 10 binary bits. However, computer architectures are almost exclusively sized around powers of two (16-bit, 32-bit, 64-bit) and accordingly these sizings will not actually be directly divisible by 10. Straightforward use of the Chen-Ho algorithm described above would simply waste any remaining bits. For example, on a 64-bit system only 6*3=18 decimal digits could be stored using 60 of the 64 available bits. Whilst this is certainly an improvement on the 16 decimal digits obtainable with binary coded decimal, the practical storage efficiency here has decreased to 3.322/(64/18)|0.93, significantly below the theoretical maximum. In fact both Chen-Ho (in the same paper referenced above) and U.S. Pat. No. 3,618,047 also describe an algorithm for coding 2 decimal digits into 7 bits, which is based on the same principles as the 3 to 10 algorithm (splitting according to large/small decimals). Furthermore U.S. Pat. No. 3,618,047 contemplates combining the 3 and 2 digit algorithms, suggesting that 11 decimal digits could be broken into 3 groups of 3 digits and 1 group of 2 digits. As another example, a 64-bit space could be split up as (5*10)+(2*7) to allow coding of 19 decimal digits; the same number of decimal digits could also be obtained by splitting the 64-bit space as (6*10)+(1*4), where the 4-bit coding is simply standard binary decimal coding. Actually 19 is the maximum possible number of complete decimal digits that can be encoded in 64 bits (since 64* log10(2)=19.27xe2x80x94ie less than 20), so both these approaches are theoretically optimal.
However, such a combination of different coding (and decoding) patterns compromises the simplicity which was one of the major initial attractions of the Chen-Ho algorithm.