In some known speech recognition methods, words are recognized by spectral analysis, and by applying an extracted feature vector of a certain audio segment as an input to a pre-trained learning engine, that may obtain the prior probabilities of the words from a dictionary and/or a certain linguistic model.
A linguistic model is sometimes obtained by estimating probabilities of word occurrences based a plurality of texts and/or ground-truth marked audio streams. The learning engine is usually trained over pre-recorded audio samples of a target application dictionary, as spoken by a certain target, population, recorded on a target hardware, and in the target environment conditions.