Automatic Speech Recognition (ASR) systems are typically set up with Hidden Markov Models (HMM), developed with continuous Gaussian mixture (GMM) emission densities and context-dependent phones. Currently, Deep Neural Networks (DNN) that have many hidden layers outperform GMMs on a variety of speech recognition benchmarks [1]. These state of the art ASR systems are trained on large amounts of recorded speech data and benefit from the availability of annotated speech material. The amounts that are required to build a competitive ASR system are usually available for widely spoken languages and for large-scale applications with great economical potential such as speech-to-speech and speech-to-text translation. However, the majority of languages are low-resource languages with a lot of peculiarities in phonotactics, word segmentation or morphology, or dialects lacking strict language convention. Moreover, a considerable share of currently developed ASR applications are tailored solutions with limited economical potential developed for one customer only or for a small user group.