Recently, an apparatus for editing speech synthesis is proposed. As to the apparatus, first, a user directly edits phonemic and prosodic information acquired by analyzing a text. After editing, the apparatus converts the phonemic and prosodic information to a speech waveform. In this apparatus, in order to support the user's editing work, the user's editing history for phonemic and prosodic information such as a reading sign, a prosodic sign and synthesized speech control information (fundamental frequency, phoneme, duration), is stored. By using this editing history, a speech waveform before editing is appeared again.
When an accent phrase of some text is edited, in above-mentioned technique, first, phonemic and prosodic information before editing is converted to a speech waveform, and listened by a user. After editing, the phonemic and prosodic information edited is converted to a speech waveform, and listened by the user. In this way, as to the conventional technique, the user listens to a speech waveform of phonemic and prosodic information before editing, edits the phonemic and prosodic information, and listens to a speech waveform of the phonemic and prosodic information edited. Accordingly, it is difficult for the user to correctly confirm a difference of the speech waveform occurred by editing.