Natural Language Processing (NLP) systems seek to automate the extraction of useful information from sequences of symbols in human language. Some NLP systems may encounter difficulty due to the complexity and sparsity of information in natural language. Neural network language models (NNLMs) may overcome limitations of the performance of traditional systems. A NNLM may learn distributed representations for words, and may embed a vocabulary into a smaller dimensional linear space that models a probability function for word sequences, expressed in terms of these representations.
NNLMs may generate word embeddings by training a symbol prediction task over a moving local-context window. The ordered set of weights associated with each word becomes that word's dense vector embedding. The result is a vector space model that encodes semantic and syntactic relationships. A NNLM can predict a word given its surrounding context. These distributed representations encode shades of meaning across their dimensions, allowing for two words to have multiple, real-valued relationships encoded in a single representation. This feature flows from the distributional hypothesis: words that appear in similar contexts have similar meaning. Words that appear in similar contexts will experience similar training examples, training outcomes, and converge to similar weights.
Once calculated, word embeddings based on word analogies can allow vector operations between words that mirror their semantic and syntactic relationships. The analogy “king is to queen as man is to woman” can be encoded in vector space by the equation king-queen=man-woman.
Conventional NNLMs may not account for morphology and word shape. However, information about word structure in word representations can be valuable for part of speech analysis, word similarity, and information extraction. It is with respect to these and other considerations that aspects of the present disclosure are presented herein.