The invention relates to a method for automatically segmenting speech for use in speech processing applications. Of various possible applications, a particular one is speech synthesis, more in particular speech synthesis based on the concatenating of diphones. Diphones are short speech segments that each contain mainly a transition between two adjacent phonemes, plus the last part of the preceding and the first part of the succeeding phoneme, respectively. Diphones may be extracted according to certain rules that are known per se, from a database that has already been segmented into single phonemes. Typically, such a data base consists of isolated words recorded from a particular single speaker in a controlled environment, and also comprises the verified correspondence between phonetic transcription and acoustic realization. A straightforward and automatic realization of the segmentation method according to the preamble and based on phoneme Hidden Markov Models (HMM) has been disclosed in O. Boeffard et al, Automatic Generation of Optimized Unit Dictionaries for Text to Speech Synthesis, International Conference on Speech and Language Processing, Banff, Alberta CANADA (1992), p. 1211-1215. However, the quality of the known method has been found insufficient, in that the boundaries found by the method generally deviate too much from the positions where corresponding boundaries would be placed by a manual procedure. Of course, the segmentation accuracy could be improved if the phoneme HMMs are first trained with a separate and manually segmented database. Setting up of such a manually segmented database is however often too costly, since this has to be repeated each time a new speaker person will be used in a speech synthesis system. In consequence, amongst other things it is an object of the present invention to propose a method for speech segmentation, that is fully automatic, does not need manually segmented speech material, and gives a better result than the reference.