The following U.S. Patent documents provide descriptions of art related to the present application: U.S. Pat. No. 5,418,889, issued May 1995 to Ito (hereinafter Ito); U.S. Pat. No. 5,696,916, issued December 1997 to Hitachi (hereinafter Hitachi); U.S. Pat. No. 6,026,388 issued February 2000 to Liddy et al. (hereinafter Liddy); U.S. Pat. No. 6,185,592, issued February 2001 to Boguraev et al. (hereinafter Boguraev 1); U.S. Pat. No. 6,212,494, issued April 2001 to Boguraev (hereinafter Bogureav 2); U.S. Pat. No. 6,263,335, issued July 2001 to Paik et al. (hereinafter Paik); U.S. Pat. No. 6,754,654, issued June 2004 to Kim et al. (hereinafter Kim); U.S. Pat. No. 6,823,325, issued November 2004 to Davies et al. (hereinafter Davies); and U.S. Pat. No. 6,871,199, issued March 2005 to Binniget et al. (hereinafter Binniget).
Knowledge bases and knowledge engineering are the key components of modern information systems and correspondingly technologies. Knowledge engineering was traditionally based on generalization of information obtained from experts in different knowledge domains. However, analysis shows that this approach cannot be utilized for creating adequate real-life (industrial) applications. Two questions arise: first, what can be the most reliable and effective source of such knowledge; and second, how can this knowledge be recognized, extracted and later formalized. Analysis shows, that at the present time, the time of global computerization, the most reliable source of knowledge is text in the broad sense of the word, that is, text as a set of documents in natural language (books, articles, patents, reports etc.). Thus, the basic premises of knowledge engineering in the light of the second question are as follows:                1 text is the ideal natural and intellectual model of knowledge representation        2. one can find everything in the text        
The second premise may seem excessively categorical, but with the tendency to increase the text range, this is more and more the case.
What types of knowledge can be obtained from text and with what automatic means? Some existing methods are aimed at databases having a strict structure and manually compiled or at texts with strictly defined fields. A shallow linguistic analysis of text is usually performed. Kim describes processing text with a rigid structure (primarily emails). Kim's process extracts corresponding information from previously known fields of source documents and places it in predefined fields of a database (DB) that reflects the structure of the organization (such a DB has, for example, fields for names and titles of individuals within an organization). The linguistic processing described in Kim is utilized only for the extraction of key terms from documents according to the so-called filters.
Davies describes the performance of lexical and grammatical analysis of text in order to differentiate nouns from verbs and to perform, in such a way, a strongly definite search in a predefined and structured database according to “how,” “why,” “what,” and “what is” relations.
Binniget also describes the use of a pre-structured database (i.e., a Knowledge Database) in the form of a fractal hierarchical network, which reflects the knowledge of the outside world (knowledge domain) in order to automatically expand information from an input string. Initially the input string (for example, part of sentence, or the whole sentence, etc.) is treated with a semantic processor that performs syntactic and grammatical parsing and transforming to build an input network. This network is then “immersed” into the Knowledge Database to expand the input information that is some kind of recording and later expansion of input information by means of a model of the outside world concerning objects, their relations and attributes.
Boguraev 1 describes the performance of a deep text analysis where, for text segments, the most significant noun groups are marked on the basis of their usage frequency in weighted semantic roles.
All abovementioned cases are concern with a particular knowledge about concepts. This is an entry level of knowledge that can be extracted from text.
Boguraev 2 describes the use of computer-mediated linguistic analysis to create a catalog of key terms in technical fields and to also determine doers (solvers) of technical functions (verb-object).
Hitachi describes a system that uses a predefined concept dictionary with high-low relations, namely, is-a relations and part-whole relations between concepts.
Liddy uses a similar technology for user query expansion in an information search system.
Ito describes the use of a Knowledge Base, including a Causal Model Base and a Device Model Base. The Device Model Base has sets of device knowledge describing the hierarchy of devices of the target machine. The Casual Model Base is formed on the basis of the Device Model Base and has sets of casual relations of fault events in the target machine. Thus, the possible cause of failure in each element of the device is guessed on the basis of information about its structural connections with other elements of the device. Usually, these are the most “connected” elements, which are determined as the cause.
Paik describes a system that is domain-independent and automatically builds its own subject knowledge base. The system recognizes concepts (any named entity or idea, such as a person, place, thing or organization) and relations between them. These relations allow the creation of concept-relation-concept triples. Thus, the knowledge recognized in Paik is close to the next important knowledge level-facts (subject-action-object), although they are not yet facts suitable for recognition of such important semantic relations as Whole-Part relations.
In fact, none of the above approaches teach or suggest processing text in electronic documents or digital information to determine Whole-Part semantic relations between objects/concepts and facts of the outside world/subject domain.