The following relates to the language modeling arts, language processing arts, and related arts.
A typical language model of order N operates on text strings of maximum length N words (or N characters in a language such as Chinese, or more generally N symbols where “symbol” encompasses a word, or a character, or equivalent of the modeled language). For example, a bigram language model has N=2, a trigram language model has N=3, and so forth. In such language models, a useful operation is to compute the probability p(A,z)=p(z|A) where (A,z) denotes a symbol string A followed by a single symbol z. The notation Az is sometimes used herein as shorthand for (A,z). By way of illustrative example, if A=(The two) and z=(cats) then (A,z)=Az=“The two cats”. Intuitively, the probability p(A,z)=p(z|A) thus gives the probability of the symbol z following the string A in text of the modeled language. The string A of Az is sometimes referred to as the context of z.
A standard way of representing certain types of smoothed language models is through a so-called “ARPA” table. Such a table provides a compact “backoff” representation suitable for looking up probabilities of the form p(A,z) predicted by the language model. For a language model of order N, the ARPA table contains n-grams of order 1 to N, with higher-order n-grams being more sparsily recorded than lower-order n-grams. An ARPA table can be constructed to have the following property: If the ARPA table contains an n-gram of order n, then the ARPA table also contains all substrings of this n-gram of order 1 to n−1. Each n-gram Az is a line entry in the ARPA table, and each such line entry Az has two associated columns containing non-negative numbers Az.p and Az.b. The number Az.p is always less than one and corresponds to the conditional probability p(A,z)=p(z|A) assigned by the language model to the word z in the context A. The number Az.b is referred to as the back-off weight (bow) for the context A, and is used in computing conditional probabilities associated with n-grams that are not listed in the ARPA table.
For some applications, it is also useful to compute so-called “max-backoff” values. For an n-gram Az, the max-backoff is defined as the highest probability p(hAz)=p(z|hA) for any “head” or “prefix” h, where h denotes any possible string (including the possibility of the empty string ∈) that could precede A. Formally, max-backoff w(A,z)≡maxh p(z|hA).
It has been proposed (Carter et al., “Exact Sampling and Decoding in High-Order Hidden Markov Models”, in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1125-34, Jeju Island, Korea (July 2012)) to determine the max-backoff w(A,z) in an application using an extended ARPA table, referred to herein as a Max-ARPA table, in which two additional columns are added: (1) a column for the max log probability, which is equal to the maximum log probability over all the n-grams extending the context A, and (2) a column for a “max backoff” weight which is a number used for computing the max log probability of an n-gram not listed in the Max-ARPA table. With the values in these columns, the max-backoff can be recursively computed for Az values that are not listed in the Max-ARPA table.