Computer-based speech recognition techniques have used local contexts (e.g., previous 2 words uttered by a user) to predict a next word that a user is going to say. For example, techniques have used neural networks (e.g., recurrent neural networks (RNNs)) to provide such predictions. For instance, a recurrent neural network can include an input layer with nodes that represent a vocabulary of words, one or more hidden layers with nodes that are fully connected to the nodes of the input layer, and an output layer with nodes that are fully connected to the nodes of one of the hidden layers. Input can be provided to the input layer by activating one or more of the nodes of the input layer (e.g., providing the one or more nodes with a predetermined value) that correspond to the word(s) that are part of the local context. The activation value can be propagated through the connections in the neural network and can cause probability values for words corresponding to the nodes of the output layer to be output. The probability values can indicate how likely the words corresponding to the nodes are to be a “next word” that is uttered by a user. For example, probability values can be used to help differentiate between whether a user said “surgeon” or “sturgeon,” which a speech recognition system may be unable to differentiate between with a reliable degree of certainty.