In neural networks, entities are represented as finite-dimensional vectors in a low-dimensional space (relative to the total number of entities) at both the input and output layers. In the case of neural network language models, which can be used in various speech and text applications (e.g., automatic speech recognition, word prediction, word correction, etc.), these entities are lexical tokens representing words, phrases, or characters. The words, phrases, or characters are represented as vectors (e.g., vector representation) that are learned as part of the internal structure of the neural network language model. Such vector representations of words, phrases, or characters are finite-dimensional and incorporate semantic and syntactic regularities. In conventional neural network language models, these vector representations are parameterized according to their finite-dimension. That is, for a d-dimensional vector representation of a token, the token is parameterized by d free, learnable parameters.