An example of conventional voice quality conversion techniques is to prepare a large number of pairs of speech of the same content spoken in two different ways (e.g., emotions) and learn conversion rules between the two different ways of speaking from the prepared pairs of speech (see Patent Literature (PTL) 1, for example). The voice quality conversion technique according to PTL 1 allows conversion of speech without emotion into speech with emotion based on a learning model.
The voice quality conversion technique according to PTL 2 extracts a feature value from a small number of discretely uttered vowels to perform conversion into target speech.