1. Field
Methods and apparatuses consistent with exemplary embodiments relate to generating a singing voice, and more particularly, to generating a singing voice by transforming average voice data of a speaker.
2. Description of the Related Art
In a voice synthesis method using a statistical processing method, a voice signal parameter representing features of a voice is extracted, the parameter is classified into designated units, and then a value that represents each unit the best is estimated. A large amount of voice data is required to allow the units to achieve statistically meaningful values. In general, large cost and effort are required to construct the voice data. In order to solve this problem, an adaptation method is suggested.
The adaptation method aims to represent unit values similar to a level of a voice synthesis method which uses a large amount of voice data, even when the adaptation method uses a small amount of voice data. In order to achieve this goal, the adaptation method uses a transformation matrix.
A generally used method of forming a transformation matrix is a maximum likelihood linear regression (MLLR) method. The transformation matrix represents correlations between voice data and is used to transform units of voice A having a large amount of data to represent features of voice B having a small amount of data based on correlations between the voice A and the voice B.
The MLLR method performs well when transforming voice data between normally spoken general voices, but reduces sound quality when transforming a general voice into a singing voice. This is because the MLLR method does not consider a pitch and duration of a sound, which are important elements of a singing voice. Accordingly, a method of efficiently generating a singing voice by transforming a general voice is required.