The present invention, generally, relates to data augmentation technique, more particularly, to technique for generating voice data having a particular speaking style.
It has been known that automatic speech recognition (ASR) systems degrade its performance when acoustic environment in target utterances is different from environment for the training data. Such acoustic environment includes not only a type of noises but also a speaking style. Spontaneous speech such as a conversation, very fast and slow utterances and ambiguous pronunciations are well known as harmful speaking styles for speech recognition.