1. Field of Invention
This invention relates to systems and methods for normalizing linguistic structures.
2. Description of Related Art
Information retrieval tools are widely known which select text passages matching user criteria according to key words input by the user. These tools typically retrieve all available text passages that contain the key words, but do not apply linguistic and semantic and/or semantic analysis to the text passages.
Christian Jacquemin, in a paper entitled “Variation terminologique: Reconnaissance et acquisition automatique de termes et de leurs variants en corpus” (Terminological variation: identification and automatique extraction of terms and their variations from corpora”), discusses techniques to improve the access to textual database contents. The techniques discussed by Jacquemin involve morpho-syntactic variations of words, and focus on identifying similar terms or linguistic expressions in documents. However, there is no re-writing of text passages into a normalized syntactic structure.
Other operations that may be performed on text passages include Information Extraction and Discourse Processing. These operations are applied in the context of, for example, an automatic translation system in which a user inputs text to be translated into another language, and the system performs the translation, or a natural language querying system, in which a user inputs a query or search request in natural language form, such as “How is the BicD gene repressed?” For information extraction or discourse processing, semantic relationships between described entities are needed. Information processing at this level is typically performed using an extraction of syntactic dependencies and then pattern matching to detect predetermined patterns of information. At this level, natural language complexity is a problem because it gives the possibility for a same piece of information to be expressed using many different linguistic constructions. Therefore, to capture a specific information in a text passage, the pattern designer has to anticipate these linguistic structures and write all the possible pattern variations. For example, consider the sentence:                Antp protein is a repressor of the BicD gene.This sentence describes an action of repression between the entity “Antp protein” and the entity “BicD gene”. This information can be extracted by the following pattern:        X is a repressor of YBut the same fact may also have been described by the following sentences:        Antp protein represses the BicD gene.                    Pattern: X represses Y                        Antp protein has a repressive effect on the BicD gene.                    Pattern: X has a repressive effect on Yand so on. This implies a large collection of patterns to get one simple piece of information.                        