In recent years, there has been remarkable progress in staticstical parametric speech synthesis, particularly HMM (hidden Markov Model (HMM)-based speech synthesis has been activity studied). Since the HMM-based speech synthesis enables speaker adaptation with ease, it is characterized by the ability to enable creation of a speech synthesis dictionary even from only a small volume of speech. For that reason, even an average user can casually create a speech synthesis dictionary; and it is believed that, in future, average users would disclose and share speech synthesis dictionaries with each other thereby resulting in the expansion of the speech synthesis technology.
On the other hand, a user with bad intent may use the speech synthesis dictionary of some other person to impersonate that other person, or a speech synthesis dictionary can be created from a speech that is fraudulently obtained from media such as TV or the Internet. Thus, there is an increasing concern about fraudulent use of speech synthesis dictionaries. Thus, in future, if speech synthesis can be done at a substantially equivalent level to the human beings, there is a concern about the abuse of synthesized speeches, such as using the voices of famous people without permission for doing promotion or impersonating other persons and making phone calls.
In that regard, prevention/suppression of impersonation can be achieved if a digital watermark is embedded in the synthetic speech, and if the receiving side of the synthesized speech with an embedded digital watermark detects the watermark and informs the user on the receiving side that a synthesized voice is received. This digital watermark embedding method can be used in pulse-driven speech synthesis systems in general.