In general, Named-Entity Recognition Systems (hereinafter referred to as NERSs) are used to identify and label particular classes of names in textual information. Examples of such name classes are organization names, person names, location names, dates, times, monetary amounts, and percentages. One application for a NERS is the generation of a searchable database. For example, suppose a newspaper sets out to create a searchable database of all of its stories ever published. In order to be able to search for stories containing information about particular organizations, people, locations, etc., each story within the database must be categorized with respect to the name classes contained therein. Once categorized, stories which contain desired organizations, people, locations, etc. can be easily retrieved. The need to identify names in text also extends to other media such as magazines, radio, wire services, etc.
Difficulties in identifying and classifying names arise because of naturally occurring ambiguity between name classes. The following pairs of sentences illustrate the problems caused by name class ambiguity:
1a. Anne Dakota reported earnings of twenty three cents a share. PA1 1b. Anne Dakota reported for work as usual on Monday morning. PA1 2a. April is usually a moody person, but not this week. PA1 2b. April is usually a rainy month, but not this year. A human reader can discern from the context that the subject of sentence (1a) is an organization, while in sentence (1b), the subject is a person. Similarly, a human reader can discern from the context that in sentence (2a), "April" is a person, and in sentence (2b), "April" is a month of the calendar year. Thus, we see in these two examples ambiguity between organization names and person names, and between person names and dates. Such ambiguity is widespread among these name classes, and similar ambiguities occur among most other name classes of interest. Name class ambiguity presents an especially difficult challenge when labeling unknown words. For example: PA1 3. Barney Smith said that Phil Jones is leaving the firm to pursue other interests. In this example, it is impossible to determine with certainty whether "Barney Smith" is the name of an organization or a person, unless the name happens to be known in advance. In such cases, a NERS must make a determination based on incomplete and uncertain information.
Many prior art methods of recognizing names in text incorporate large and complex sets of rules. For example, to determine the correct class for "Anne Dakota" in sentence (1a), a name finding system might contain the following rule:
Rule--when a subject reporting earnings could be either a person or an organization, always assume the subject is an organization.
Such rules are typically generated manually, a process which is time consuming and iterative. Highly expert rule writers are required to design an effective system, and a rule based system is difficult to maintain and update. Furthermore, the rule development process must be repeated for all new name classes, as well as for each new language of interest. For example, finding names in Spanish text requires an entirely different set of rules than those used for English.
Hidden Markov Models (hereinafter referred to as HMMs) have been widely used in the prior art in speech recognition applications. HMMs have also been successfully applied to part-of-speech recognition and labeling (for example, see Church, 1988). In general, an HMM defines a system having a finite number of states, with each state capable of emitting a number of information symbols. For each state in the HMM, there exists some probability of a transition from any other state in the HMM, including self-looping transitions. Consequently, if there are N states in the HMM, there must exist N.sup.2 transition probabilities.
By making a Markov independence assumption regarding the state transitions of the system, each transition probability may be conditioned on only the previous state. The N.sup.2 transitional probabilities may then be represented as a matrix, with each element of the matrix being written as EQU a.sub.ij =Pr(q.sub.t =S.sub.j .vertline.q.sub.t-1 =S.sub.i),(1)
where q.sub.t represents the state of the system at time=t, q.sub.t-1 represents the state of the system at time=(t-1), S.sub.j represents the j.sup.th state of the system, S.sub.i represents the i.sup.th state of the system, and a.sub.ij is the probability of a system state transition to the S.sub.j.sup.th state at time=t, given that the previous state of the system at time=(t-1) was S.sub.i. Each of the transitional probabilities have the following properties: ##EQU1## A discrete HMM is defined by the following quintuple: HMM=(S, .SIGMA., A, B, .pi.), where
S is the set of states, .vertline.S.vertline.=N, PA2 .SIGMA. is the set of discrete output symbols, .vertline..SIGMA..vertline.=M, PA2 A is the state transition probability matrix, where PA2 B is the observation symbol probability distribution in state S.sub.j, where PA2 .pi. is the initial state probability distribution vector, where each element of .pi. is given by
a.sub.ij =Pr(q.sub.t =S.sub.j .vertline.q.sub.t-1 =S.sub.j), 1.ltoreq.i,j.ltoreq.N PA3 b.sub.j (k)=Pr(.sigma..sub.k .epsilon..SIGMA. at time t .vertline.q.sub.t =S.sub.j) 1.ltoreq.j.ltoreq.N 1.ltoreq.k.ltoreq.M and, PA3 .pi..sub.j =Pr(q.sub.1 =S.sub.j), for all j from 1 to N.
The HMM provides a probability distribution over the entire output symbol alphabet for every state, i.e., each state has some probability of producing any symbol from the alphabet.
The following example illustrates an application of an HMM. Suppose in a library there are a thousand books written by a hundred different authors. Now suppose an adversary randomly chooses a book from the library and discloses the number of pages in that book. The adversary then proceeds to select a series of books via a random process, where each selection is conditioned upon the author of the previous book, and discloses the number of pages in each book. Given only the resulting series of page count observations, our goal is to produce the most likely sequence of authors. An HMM can model this output with one state per author, where, at each state, there is some probability for generating a book of any number of pages. Given the first page count observation, the HMM uses .pi. to make a transition to any state with some probability, thereafter using the transition probability matrix A to determine the rest of the sequence of state transitions, while using B to generate an observation at each state.