Data-driven or supervised machine-learning algorithms are emerging as important tools for information analysis in portable devices, the cloud, and other computing devices. Machine learning involves various algorithms that can automatically learn over time. The foundation of these algorithms is built on mathematics and statistics that can be employed to predict events, classify entities, diagnose problems, and model function approximations. Applications of these algorithms include semantic text analysis, web search, and speech and object recognition, just to name a few examples. Supervised machine-learning algorithms typically operate in two phases: training and testing. In the training phase, typical input examples are used to build decision models that characterize the data. In the testing phase, the learned model is applied to new data instances in order to infer different properties such as relevance and similarity.
Generally, a search engine processes a query by directly comparing terms in the query with terms in documents. In some cases, however, a query and a document use different words to express the same concept. The search engine may produce unsatisfactory search results in such circumstances. A search engine may augment a query by finding synonyms of the query terms and adding those synonyms to the query. But this technique may fail to uncover conceptual similarities between a query and a document.
Neural network techniques are widely applied to obtain high-quality distributed representations of words (e.g., word embeddings) to address text mining, information retrieval, and natural language processing tasks. Though some methods may learn word embeddings from context that captures both semantic and syntactic relationships between words, such methods may be unable to handle unseen words or rare words having insufficient context, for example.