The invention relates to the translation of numbers from one form of representation to another and, in particular, to the translation of a numerical representation of a number into an alphabetical representation, or from an alphabetical representation into a numerical representation.
A variety of automated business applications would benefit from the translation of numerical representations of numbers to alphabetical representations having the same number value. Printing a check is one example of a computer system business application that employs both numerical representations of numbers, comprising numerical character strings, and alphabetical representations of numbers, comprising alphabetical character strings. Typically, a check""s payable amount is printed on the check as a numerical character string, e.g., xe2x80x9c$4,562.92xe2x80x9din the check""s upper right hand corner. The payable amount is typically also xe2x80x9cspelled outxe2x80x9d, e.g., xe2x80x9cfour thousand five hundred and sixty-two dollars and ninety-two centsxe2x80x9d at another location on the face of the check. Since numbers are typically stored within a computer as a binary numerical representation, it is relatively easy to produce a number in numerical form for printing on a check. But a stored numerical representation must be translated into a human-readable form such as a natural language representation, which typically will take the form of an alphabetical representation, for printing of a check""s xe2x80x9cspelled outxe2x80x9d value, and this translation is substantially more involved than the translation from an internal binary numerical representation to an external decimal numerical representation. There are a number of other business applications which require translation from a number""s numerical character string representation to an alphabetical character string representation.
A number translation engine might also find more general application in speech synthesis and speech recognition applications. A number translation engine which translates numerical character strings into alphabetical character strings could be used within a speech synthesis system to provide an appropriate character string to a speech synthesis system""s output sound system. For example, a speech synthesis system contained within a slot machine might announce to a gambler, and not insignificantly, to nearby gamblers, that the player has won xe2x80x9cfour thousand five hundred and sixty-two dollarsxe2x80x9d. Without proper translation from numerical to alphabetical character strings, the announcement may sound something like xe2x80x9cfour five six two dollarsxe2x80x9d, or even worse. As a result, a good deal of the drama, and advertising value, associated with the announcement would be lost.
A number translation engine may also be employed to translate alphabetical representations of numbers into numerical representations within a speech recognition system. For example, rather than requiring a user to enunciate numbers in an unnatural, awkward, fashion, e.g., xe2x80x9cfour five six two point nine two dollarsxe2x80x9d in a speech-input banking application, a number translation engine may allow a person to speak in a natural manner, indicating that they would like to deposit xe2x80x9cfour thousand five hundred and sixty-two dollars and ninety two centsxe2x80x9d.
Although number translators which transform a numerical representation of a number into an English language alphabetical representation exist, such translators do not accommodate a variety of languages, or even various representations, such as ordinal and cardinal representations, within a single language. The development of a number translator that can accommodate various languages faces significant obstacles. For example, it""s not enough to modify an algorithm for English to read the literal string values from a resource file. English separates the component parts of a number with spaces; Italian and German do not. Furthermore, although the other digit positions are separated by spaces, English and French separate the ones and tens digits with a hyphen, Spanish uses xe2x80x9cy,xe2x80x9d and many other languages either use a space or nothing. Some languages, such as Greek and Swedish, run the tens and one digits together into one word but put spaces between the others.
Some languages, such as Spanish and Italian, drop the word for xe2x80x9conexe2x80x9d from the phrases xe2x80x9cone hundredxe2x80x9d or xe2x80x9cone thousand.xe2x80x9d In Spanish, for example, xe2x80x9cone thousandxe2x80x9d is xe2x80x9cmil,xe2x80x9d not uno mil.xe2x80x9d In some languages, such as German, the word for xe2x80x9conexe2x80x9d in xe2x80x9cone hundredxe2x80x9d or xe2x80x9cone thousandxe2x80x9d is different from the word for xe2x80x9conexe2x80x9d on its own: in German, xe2x80x9conexe2x80x9d is eins,xe2x80x9d but xe2x80x9cone thousandxe2x80x9d is xe2x80x9ceintausend,xe2x80x9d not xe2x80x9ceinstausend.xe2x80x9d In some languages, the word for xe2x80x9chundredxe2x80x9d or xe2x80x9cthousandxe2x80x9d becomes plural when there""s a number other than 1 in the hundreds place. In French, for example, 100 is xe2x80x9ccent,xe2x80x9d but 200 is xe2x80x9cdeux cents.xe2x80x9d In some languages, the word for xe2x80x9chundredxe2x80x9d or xe2x80x9cthousandxe2x80x9d also changes form depending on whether it""s followed by more digits. 100 in Spanish is xe2x80x9ccien,xe2x80x9d for example, but 101 is xe2x80x9cciento uno.xe2x80x9d
In most languages, the words for the values from 11 to 19 are based on the words for the values from 1 to 9, but are not simple concatenations. In English, for example, 15 is xe2x80x9cfifteenxe2x80x9d and not xe2x80x9cfiveteen.xe2x80x9d This also happens for the words for the tens digits in most languages (twenty, instead of twoty, in English). In some languages, this also applies to other groups of words. In Spanish, for example, the tens and ones digits are usually joined by xe2x80x9cyxe2x80x9d; xe2x80x9cthirty-onexe2x80x9d is xe2x80x9ctreinta y uno.xe2x80x9d But the values from 21 to 29 contract the phrase down into a single word; instead of xe2x80x9cveinte y uno,xe2x80x9d you say xe2x80x9cveintiuno.xe2x80x9d So these values have to be special-cased. Worse, it still isn""t a simple concatenation. Sometimes, the ones digit acquires an accent mark it doesn""t have when standing alone: 22, for example, is xe2x80x9cveintidxc3x3sxe2x80x9d instead of xe2x80x9cveintidos.xe2x80x9d
In Spanish and Greek, canned strings are also required for the hundreds place. In Spanish, for example, you combine the words for 2 through 9 with xe2x80x9ccientos,xe2x80x9d but word for the multiplier sometimes changes form in the contraction. 500, for example, is xe2x80x9cquinientos,xe2x80x9d not xe2x80x9ccincocientos.xe2x80x9d One might employ canned strings for the twenties and hundreds as well, even though most languages wouldn""t need them.
There are additional peculiarities in various languages. In German, the ones digit goes before the tens digit: 23 is xe2x80x9cdreiundzwanzig.xe2x80x9d In French and German, the combination of tens and ones digit is different if the ones digit is 1 than if it""s something else: in German, 21 is xe2x80x9ceinundzwanzigxe2x80x9d instead of xe2x80x9ceinsundzwanzig.xe2x80x9d In French, xe2x80x9cetxe2x80x9d goes before the ones digit only if it""s 1; 21 is xe2x80x9cvingt-et-un,xe2x80x9d but 22 is xe2x80x9cvingt-deux.xe2x80x9d In Greek, the word for each tens digit has an accent mark that is eliminated when combined with a ones digit; 30 is xe2x80x9ctri{acute over (a )}nta,xe2x80x9d but 31 is xe2x80x9ctriantaxc3xa9na.xe2x80x9d In Italian, when the tens digit ends with a vowel and the ones digit begins with a vowel, the tens digit loses its vowel: 50 is xe2x80x9ccinquantaxe2x80x9d and 52 is xe2x80x9ccinquantadue,xe2x80x9d but 51 is cinquantuno.xe2x80x9d
Another area where permutations arise is in major groupings. For example, in American English and most European languages, large numbers are grouped by thousands (i.e., after a thousand, a new word is introduced every factor of 1,000). In British English, however, large numbers are grouped by million (a xe2x80x9cbillionxe2x80x9d in British English is a xe2x80x9ctrillionxe2x80x9d in American English; what we call a xe2x80x9cbillionxe2x80x9d is called a xe2x80x9cthousand millionxe2x80x9d in Britain). More importantly, in Japanese, large numbers are grouped by ten thousand, rather than by thousand.
French has a couple of peculiarities of its own: In European French, there are no words for 70, 80 or 90. The numbers from 70 up are rendered as xe2x80x9csoixante-dix,xe2x80x9d xe2x80x9csoixante et onze,xe2x80x9d xe2x80x9csoixante-douze,xe2x80x9d xe2x80x9csoixante-treize,xe2x80x9d and so on (literally, xe2x80x9csixty-ten,xe2x80x9d xe2x80x9csixty and eleven,xe2x80x9d xe2x80x9csixty-twelve, xe2x80x9csixty-thirteen,xe2x80x9d etc.) 80 is rendered as xe2x80x9cquatre vingtsxe2x80x9d (literally, xe2x80x9cfour twentiesxe2x80x9d), and the numbers proceed by score from there (i.e., 81 is xe2x80x9cquatre-vingt-unxe2x80x9d (xe2x80x9cfour-twenty-onexe2x80x9d), 90 is xe2x80x9cquatre-vingt-dixxe2x80x9d (xe2x80x9cfour-twenty-tenxe2x80x9d), 91 is xe2x80x9cquatre-vingt-onzexe2x80x9d (xe2x80x9cfour-twenty-elevenxe2x80x9d) and so on). In addition, the numbers between 1,100 and 1,200 are rendered as xe2x80x9conze cents . . . xe2x80x9d (literally, xe2x80x9celeven hundred . . . xe2x80x9d) instead of xe2x80x9cmille cent . . . xe2x80x9d (xe2x80x9cone thousand one hundred . . . xe2x80x9d).
In short, the rules for translating numbers from numerical to alphabetical representations present a daunting array of obstacles to the formation of a single translation engine that is capable of accommodating various languages. Similar obstacles exist for the formation of a translation engine which translates numbers from alphabetical representations to numerical representations.
The foregoing need is satisfied in one embodiment of the present invention in which a number translation engine comprises a formatter, a parser, or both. The formatter and parser each comprise a rule set and, respectively, a formatting engine or parsing engine. The engines, parsing or formatting, employ a rule set to recursively construct output representations. That is, the rule set operates in conjunction with either a formatting engine or a parsing engine to provide, respectively, a translation from a numerical representation into an alphabetical representation or a translation from an alphabetical representation into a numerical representation. Given an input numerical representation of a number, the formatting engine recursively employs the rule set to construct an alphabetical representation. Given an alphabetical representation of a number, the parsing engine recursively employs the rule set to construct a numerical representation. In the illustrative embodiment, the formatting and parsing engines employ a common rule set and effect a bidirectional mapping between numerical and alphabetical representations. That is, a numerical representation constructed by the parsing engine will be translated into an alphabetical representation by the formatting engine which, if returned to the parsing engine, will be translated back into the original numerical representation.
The formatter comprises a formatting engine which associates a input numerical representation with an output alphabetical representation having the same numerical value, in effect, translating a input numerical representation into an output alphabetical representation. For example, the formatter may be employed to associate the output alphabetical representation xe2x80x9cthirty-twoxe2x80x9d with the input numerical representation having the same numerical value, i.e., xe2x80x9c32xe2x80x9d. The translation engine also comprises a rule list, with at least one rule list for each language supported. Each rule within a rule list includes a base output alphabetical representation and an indication to the formatting engine, either implicitly or explicitly, of a group of numerical values for which the rule applies. The formatting engine obtains a base output alphabetical representation, also referred to hereinafter as a xe2x80x9crule textxe2x80x9d, from an appropriate rule within a rule list. Additionally, where appropriate, a rule indicates where xe2x80x9cadditionalxe2x80x9d output alphabetical representations may be placed in relation to the base, or root, output alphabetical representation in order to construct a complete output alphabetical representation. The formatter builds up an output alphabetical representation, adding output alphabetical representations from other rules within the rule list, as necessary. For example, the rule which includes the base output alphabetical representation xe2x80x9cthirtyxe2x80x9d may include an indication that an additional output alphabetical representation may be placed after xe2x80x9cthirtyxe2x80x9d, thereby allowing the addition of the output alphabetical representation xe2x80x9ctwoxe2x80x9d in the proper position to construct the final output alphabetical representation, xe2x80x9cthirty-twoxe2x80x9d.
The formatting engine may operate in concert with various rule lists in order to effect various numerical-to-alphabetical translations. For example, separate American English and British English rule lists may be employed with the formatting engine to allow translation from a input numerical representation to either an American English or British English output alphabetical representation, subject to a user""s selection. Other rule lists may be employed for other languages or for other variations, such as ordinal and cardinal numbers. Negative numbers, fractions, and various radices are all accommodated by the translation engine.
The new number translation engine may also comprise a parser which is configured to receive an alphabetical input string and to associate the alphabetical input string with a numerical output string having the same numerical value. For example the parser could accept an alphabetical input string such as xe2x80x9cthirty-twoxe2x80x9d and associate it with the numerical output string xe2x80x9c32xe2x80x9d. The parser comprises a parsing engine and rule list and, illustratively, the parsing engine may employ the same rule lists as employed by the formatting engine. After receiving an alphabetical representation of a number, the parser proceeds to match alphabetical characters in the string with alphabetical characters within the appropriate rule""s base output alphabetical representation, branching to other rules as necessary and adding their corresponding base values as appropriate. The parser proceeds in this manner, from rule to rule, until the input alphabetical character string is exhausted.
In addition to applications wherein character strings are displayed either in print or through electronic means, the new number translation engine is particularly suitable for application to speech recognition and speech synthesis systems. That is, the number translation engine""s formatter may be employed in a speech synthesis system to translate an electronic representation of a input numerical representation into the corresponding spoken words. In such an application, rather than forming an oral output such as xe2x80x9cthree twoxe2x80x9d, a synthesis system may form the oral output, xe2x80x9cthirty-twoxe2x80x9d. Similarly, the number translation engine""s parser may be employed to recognize alphabetical representations, such as xe2x80x9cthirty-two,xe2x80x9d and to associate the correct numerical character string with such a number""s representation, without requiring that a number be spoken in an unnatural manner, e.g., without requiring a user to say xe2x80x9cthree twoxe2x80x9d, rather than xe2x80x9cthirty-twoxe2x80x9d.
The new number translation engine may also be used to produce textual, non-word, representations of numbers. For example, the number translation engine may be employed to render numbers using numeration systems other than western (xe2x80x9cArabicxe2x80x9d) numerals, such as Japanese numerals, Roman numerals, or traditional Hebrew numerals. The number translation engine may also be used to separate a number into major and minor units, for example, to render a duration that is given in seconds as a duration given in hours, minutes, and seconds, or to render a dimension given in decimal feet, e.g., 4.5 feet, as a dimension given in feet and inches, e.g., four feet, six inches. The translation engine may also be used to list a measurement that has a changing dimension indication, e.g., to render xe2x80x9cthree metersxe2x80x9d as xe2x80x9c3 mxe2x80x9d, and xe2x80x9cthirty thousand metersxe2x80x9d, as xe2x80x9c30 kmxe2x80x9d. Additionally, the number translation engine may be used to help format error and progress messages in a more grammatical manner. For example, xe2x80x9cThis operation will finish in 10 minutesxe2x80x9d, but xe2x80x9cThis operation will finish in one minute.xe2x80x9d, or xe2x80x9cThis operation will finish in less than one minutexe2x80x9d. The new number translation engine may be used to produce any of a wide variety of representations of a numeric value.