1. Technical Field
The invention concerns a method of composing messages for speech output, in particular the improvement of the quality of reproduction of speech outputs of this kind.
2. Prior Art
In the prior art systems are known in which corresponding entries are called from a database to implement speech outputs. In detail this can be executed in such a way that, for example, a specific number of different messages, in other words, e.g., of different sentences, commands, user requests, figures of speech, phrases or similar, are filed in a memory and according to requirement for a filed message this is read out from the memory and reproduced. It is easy to see that arrangements of this kind are very inflexible, as only messages which have been fully stored beforehand can be reproduced.
Therefore there has been a changeover to dividing up messages into segments and storing them as corresponding audio files. If a message is to be output it is necessary to reconstruct the desired message from the segments. In the prior art this is done in such a way that for the message to be formed only corresponding instructions are transferred to the segments in the relevant order for the message. By means of these instructions the corresponding audio files are read out from the memory and united for output. This method of forming sentences or parts of sentences is characterised by a great flexibility with only a low memory requirement. It is, however, felt to be disadvantageous that reproduction compiled by this method sounds very synthetic as no account is taken of the natural flow of speech.
The object of the invention is to disclose a method of forming messages from segments, which takes account of the natural flow of speech and thus results in harmonious reproduction results.
By composing messages for speech output the messages composed of segments of at least one original sentence, which are stored as audio files. A message intended for output is composed from the segments stored as audio files and selected using search criteria from the stored audio files. Each segment is allocated at least one parameter characterizing its phonetic properties in the original sentence. Using the parameters of the individual segments characterizing the phonetic properties in the original sentence, a check is made as to whether the segments forming the reproduction sentence to be output as a message are composed according to their natural flow of speech.
According to the invention, therefore, with a method for composing messages for speech output from segments of at least one original sentence, which are stored as audio files, in which a message intended for output is composed from the segments stored as audio files, which segments are selected from the stored audio files using search criteria, it is provided that every segment is allocated at least one parameter characterising its phonetic properties in the original sentence and that using the parameters characterising the phonetic properties in the original sentence of the individual segments a check is made as to whether the segments forming the reproduction sentence to be output as a message are composed according to their natural flow of speech. In this way it can be achieved that in reproducing speech the natural flow and rhythm of speech of a message is largely reconstructed without the message itself having to be fully stored.
To obtain an even more natural message it is advantageous if every segment is allocated several parameters characterising its phonetic properties in the original sentence, wherein the parameters can advantageously be selected from the following parameters: length of the respective segment, position of the respective segment in the original sentence, front and/or rear transition value of the respective segment to the preceding or following segment in the original sentence, wherein the length of the search criterion allocated in each case is further used as the length of the respective segment.
To achieve particularly good results, in an advantageous further development of the invention it is provided that as transition values the last or the first letters, syllables or phonemes of the preceding or following segment in the original sentence are used. A particularly high-quality reproduction of reproduction sentences composed from audio files is achieved if phonemes are used as transition values.
As the sentence melody largely depends on the type of sentence, a further improvement in reproduction is achieved, if as a further parameter data are provided on whether the respective segment of the original sentence is derived from a question or exclamation sentence.
An advantageous further development of the invention is characterised in that for a found combination of segments forming the reproduction sentence to be output as a message an evaluation measurement is calculated from the parameters of the individual segments characterising the phonetic properties in the original sentence according to the following formula:   B  =            ∑              n        ,        I              ⁢                  W        n            ⁢                        f                      n            ,            i                          ⁢                  (          n          )                    *      
wherein fn,i(n)is a functional correlation of the nth parameter, i is an index designating the segment and Wn is a weighting factor for the functional correlation of the nth parameter. The parameter itself, its reciprocal value or a consistency value of the parameter allocated to the stored segment with the parameter which would be allocated to the segment in the combination for the message can, for example be provided as the functional correlation of a parameter. The weighting factors therein enable a very slight displacement of the preferences in determining the evaluation measurement.
According to the evaluation measurements from the found combinations of segments those whose evaluation measurement indicates that the segments of the combination are composed according to a natural flow of speech are selected as the message to be output.
In another configuration of the invention it is provided that the evaluation measurement B is calculated from the functional correlations fn(n) of at least the following parameters, length L and position P, as well as the front and rear transition value xc3x9cvorn, xc3x9chinten of the segment, according to the following formula:   B  =            ∑      i        ⁢                  {                                            W              L                        ⁢                                          f                Li                            ⁢                              (                L                )                                              +                                    W              P                        ⁢                                          f                Pi                            ⁢                              (                P                )                                              +                                    W                              U                ¨                                      ⁢                                          f                                                      U                    ¨                                    ⁢                  i                                            ⁢                              (                                                      U                    ¨                                    vorn                                )                                              +                                    W                              U                ¨                                      ⁢                                          f                                                      U                    ¨                                    ⁢                  i                                            ⁢                              (                                                      U                    ¨                                    hinten                                )                                                    }            .      
The evaluation is particularly simple if the reproduction sentence is in a format corresponding to the search criteria, wherein preferably alphanumeric character strings are used for the search criteria and the transmitted reproduction sentences.
In order to achieve a quick search in a database it is advantageous if the search criteria are hierarchically arranged in a database.
Selection of segments for the reproduction of a message is particularly easy if for selecting the segments for a message stored as audio files a test is done as to whether the reproduction sentence desired as a message coincides in its entirety with a search criterion filed in a database together with an allocated audio file, wherein, if this is not the case, the end of the respective reproduction sentence is reduced and then checked for consistencies with the search criteria filed in the database until one or more consistencies have been found for the remaining part of the reproduction sentence, if for those parts of the reproduction sentence which were detached in a preceding step the checking mentioned in the last passage is continued, if for every combination of segments whose search criteria fully coincide with the reproduction sentence a check is done as to whether the segments forming the reproduction sentence to be output as a message are composed according to their natural flow of speech and if for the reproduction of a desired message the audio files of the segments whose combination comes closest to the natural flow of speech are used.
Therefore once it is ensured that for every segment at least one data record with a search criterion, an audio file and at least one parameter characterizing its phonetic properties in the original sentence, in other words additional information on the respective segment, is filed, a combination of segments can very easily be compiled using the data records edited in this way, the reproduction of which is no longer distinguishable from a spoken reproduction of the corresponding message. This effect is achieved in that before output of a message, in other words before the reproduction of sentences, parts of sentences, requests, commands, phrases or similar, a search is done inside the database for segments from which corresponding combinations for the desired message can be formed and in that using the information on every segment used an evaluation is carried out on every found combination consisting of one or more segments, describing the approximation of the combination to the natural flow of speech. Once the evaluations for the compiled combinations are complete the combination of segments which comes closest to the natural flow of speech is selected for the message.