The usual methods of searching through textual content, have hitherto been extended to oral requests by the indirect method of predefined vocabularies. The speech request formulated by the user is transcribed by speech recognition in the form of words belonging to predefined vocabularies. These words can be used to retrieve the required text by means of a conventional textual indexing system which determines the place or places where the word occurs.
The advantage of this approach is simplicity, since transcription by speech recognition therein is simply a source of requests formulated as in writing.
The system is rather rigid, however, owing to the need for advance definition of a vocabulary, and hence one or more subjects, on to which all possible requests are xe2x80x9cprojectedxe2x80x9d.
It has been found that the priorxe2x80x94art search methods are insufficiently flexible in contexts where there is a wide range of subjects, such as the contents available on the Internet or via exe2x80x94mail.
The aim of the invention is to propose a method of searching through the contents of textual documents, using speech recognition but eliminating the constraint on the vocabulary.
To this end, a method according to the invention is characterized in that it consists in transcribing the text into a first set of phonetic units, segmenting the said spoken request into a second set of discrete phonetic units and searching for the places where the requested expression occurs in the text, by a process of aligning the said first and second sets of phonetic units. Advantageously the said alignment process is effected by means of a dynamic programming algorithm, the parameters being e.g. the cost of omission, insertion or substitution of various phonetic units.
Advantageously the values taken by the said parameters are determined by learning from a body of examples, the object being to optimize an objective function such as a probability function or a discrimination function.
According to another feature of the invention, the said objective function is the probability function, which is optimized by an analytical method comprising an EM (Expectation Maximization) algorithm having a loop in which Lagrange multipliers are used.
According to another feature of the invention, the said objective function is the discrimination function, which is optimized by means of a genetic algorithm, the evaluation function being the rate of correct identifications.
The features of the invention mentioned hereinbefore, together with others, will be clearer from the following description of an exemplified embodiment of the process according to the invention, the description being given in connection with the accompanying drawing illustrating the method.