The present invention describes a method and system for obtaining structured information from natural language texts.
Natural language texts in this connection are written expressions in a natural language, e.g. German, English, French, etc., which have a meaningful content for a language-competent reader. Structured information is relevant data, which exist in a special data format for processing in a data-processing system, e.g. tables of a relational database system or record structures of a classical programming language.
Such methods can be employed in order to convert all data processing systems, using such structured methods, to work directly on natural language texts. At present, for example, relevant data from appeal briefs are read by experts and compared with existing information sources, e.g. information stored in databases. This process is time-consuming and labour intensive.
Technical methods for solving these problems comprise scanning in the appeal briefs and presenting them on a screen of an editing tool to an expert for structure marking.
U.S. Pat. No. 5,450,598 describes a tool for generating an automatic device with infinite conditions where the data transfer is effected without the use of pointer variables. This process relates to the binary decision of the membership of a given natural language character string to the final device. It does not, however, permit structured information in the above mentioned sense to be allocated under all conditions of the end device.
The object of the present invention is to provide a system and method which enables natural language input data to be allocated structured data at high processing speeds.
In accordance with the present invention, a method is provided for obtaining structured information from a natural language text by aggregation. The aggregation comprises the steps of reading in a natural language text, recognising sentences and sentence constituents of the text using an extractor, which allocates an information-bearing structure to each sentence or sentence constituent, allocating a substitution symbol to one or more sentences or sentence constituents, introducing the results into a further information-bearing structure which comprises functors and argument terms and is described by a grammar.
Advantages of the present invention reside in the fact that texts are analyzed with a high processing speed and so, called aggregations can simultaneously be recognized and represented. These aggregations may, in part, be predetermined by the user. This permits matching to specific user requirements. At the same time, improving the precision of the assignment of the structured information.