The present invention relates generally to a natural language learning system, and more particularly, but not by way of limitation, to a system for extracting from large corpora significant syntactic constructs and applying a word embedding data processing technique in order to get similarities between phrases that have internal structure.
Mapping verbal usage to regular expressions have been considered. Conventional techniques proved that regular expressions extracted corpora can be learned and they are instrumental to a wide range of applications involving semantic processing. Such conventional techniques involve using of ontological categories.
Other conventional techniques rely on bags of words (i.e., a fixed number of lexical features) in order to predict the meaning of input content. Further conventional techniques have used Recurrent Neural Networks (RNN) with success for predicting word similarity.
However, there is a technical problem with the conventional techniques that the ontological categories hinder the accuracy of the proposed method and the reliance on bag of words can limit the prediction of ambiguous terms.