The present disclosure relates to a voice processing device and a voice processing method, and a program, and particularly, to a voice processing device and a voice processing method, and a program, in which in the case of converting voice pitch of a voice signal, a variation in the expansion and contraction of an output voice may be suppressed.
Technologies of converting voice pitch in a voice signal of a voice or a musical composition have been used for a key control in a karaoke, a key change of a reference music for a musical instrument training, or the like in the related art. When one voice signal serving as a reference is prepared, a desired key may be obtained, and this also results in a memory saving, such that such a voice pitch converting process is a useful technology.
For example, as a method of converting voice pitch of a voice signal, a method in which a cycle of a voice waveform is changed by a sampling rate converter may be exemplified. In this method, the voice signal may be converted to a voice signal having a desired voice pitch, but the number of samples of the voice signal before and after the conversion varies.
Therefore, in general, as is expected in a voice pitch conversion processing device, to obtain the same number of samples of output data as that of input data, an adjustment with respect to the number of samples of output data is performed by a time expansion and contraction process such as PICOLA (Pointer Interval Controlled Overlap and Add) (for example, refer to “Morita, Itakura: voice expansion and contraction on a time axis using PICOLA (Pointer Interval Controlled OverLap and Add), and an evaluation thereof, collected papers of Acoustical Soc. of Japan, October 1986, pp. 149-150”).