The present application relates to systems and methods for semi-supervised relationship extraction.
Natural Language Processing (NLP) aims to understand and organize unstructured text into structured format, which could enable automatic machine translation, semantic information retrieval or advanced question answer, etc. As a basic step towards automatic text understanding, the task of Relation Extraction (RE) tries to detect if a sentence describes a semantic relation between two entities of interest or not, both the relation and the entities having predefined categories.
RE is a classic NLP problem, where given a sentence, RE aims to detect if there exists a certain semantic relationship between two entities of interest in it. RE is commonly formulated as a binary classification problem as following: embodiments of the system treat a given sentence S as a sequence of n words (denoted by wi with iε{1, . . . , n}), among which there exist two known entities e1 and e2 (that are also words).S=w1w2 . . . e1 . . . e2 . . . wn-1wn  (1)For a certain type of relationship R, a RE system aims to learn a function FR so that
            F      R        ⁡          (      S      )        =      {                                        +            1                                                if            ⁢                                                  ⁢                          e              1                        ⁢                                                  ⁢            and            ⁢                                                  ⁢                          e              2                        ⁢                                                  ⁢            are            ⁢                                                  ⁢            associated            ⁢                                                  ⁢            by            ⁢                                                  ⁢            the            ⁢                                                  ⁢            relation            ⁢                                                  ⁢            R                                                            -            1                                    otherwise                    The RE systems have two key components: (1) data representation, that is, how to encode the semantic and syntactic information within text sentences in a meaningful style. (2) learning algorithm which utilizes the sentence representation to optimally classify whether given sentences are related to a predefined relation R or not.
Over years, there have been many methods proposed to solve the relation extraction (RE) problem. Such methods normally represent words as Part-of-Speech (POS) tags or related ontology terms. Widely used sentence representations include parse tree and dependence parsing graphs. Despite years of progress, automatic RE still remains a challenging task due to two reasons. First of all, feature-representations of English sentences are hard for RE problem because the task is associated to both the syntactic structures and the semantic patterns of natural text. Secondly, the lack of sufficient annotated examples for model training also limits the capability of current RE systems.