Natural language processing is hindered by the inability of machines to recognize the function of words as they appear in their context. The context for the words are the sentences in which they are framed. The functions of a word are indicated by the word's syntax.
The task is complicated by the fact that words can be used in several parts of speech. For instance, the word “fine” could be a noun, a verb, an adjective, or an adverb. The single most important task in the machine parsing of natural language is to be able to identify which part of speech a word is being used as. One of the most complicating factors in resolving parts of speech of words in English is that many nouns can also be verbs. The articles, adjectives, and possessive pronouns are very important cues to resolve this problem, as illustrated in the case of “a fine vase.” Since the word fine follows an article, a rule can be established and applied in which fine cannot be a verb or an adverb. Once that rule has been applied, the phrase “a fine vase” can be merged into a noun phrase regardless of whether the word “fine” is a noun or an adjective.
The ability to use a computer to determine the appropriate syntax for sentences permits computers to participate in analysis of enormous amounts of information such as news reports from around the world. Analysis of such large data bases can be useful in plotting trends in terms of a general understanding of, for example, violence or political unrest in various parts of the world. Alternatively, analysis may be conducted to plot news trends and how they relate to various stock market performance indices. Numerous such analyses are possible but in order to obtain meaningful interpretation from any such analysis, the system must be able to parse sentences in the raw data
A news analyzer would begin with a filter formatter which identifies the beginning and end of a sentence. The filter formatter needs to distinguish between periods that are found in the middle of a sentence and those which are found at the end of a sentence. Each sentence may then be provided to a parser for determining the syntax of the sentence. With the syntax of the sentence automatically determined, it then becomes possible to identify the action or verb set forth in the sentence, the subject of the sentence and the object of the action.
The parsed sentence is then provided to an events generator arranged in accordance with the particular news analysis desired. The events generator would look for particular words of interest to the particular analysis being performed. In conjunction with the parsing of the sentence, the import of the various words can be better determined and more properly characterized in the final analysis. Events of import can be counted and associated with categories such as areas of the world. Such counted information can then be displayed or analyzed in chart or report format. The reliability of the analysis can be significantly enhanced by providing a parser that reliably identifies the proper syntax of the sentence.